EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

Language Statistics at Different Spatial, Temporal, and Grammatical Scales.

Authors

Sánchez-Puig, Fernanda; Lozano-Aranda, Rogelio; Pérez-Méndez, Dante; Colman, Ewan; Morales-Guzmán, Alfredo J.; Rivera Torres, Pedro Juan; Pineda, Carlos; Gershenson, Carlos

Abstract

In recent decades, the field of statistical linguistics has made significant strides, which have been fueled by the availability of data. Leveraging Twitter data, this paper explores the English and Spanish languages, investigating their rank diversity across different scales: temporal intervals (ranging from 3 to 96 h), spatial radii (spanning 3 km to over 3000 km), and grammatical word ngrams (ranging from 1-grams to 5-grams). The analysis focuses on word ngrams, examining a time period of 1 year (2014) and eight different countries. Our findings highlight the relevance of all three scales with the most substantial changes observed at the grammatical level. Specifically, at the monogram level, rank diversity curves exhibit remarkable similarity across languages, countries, and temporal or spatial scales. However, as the grammatical scale expands, variations in rank diversity become more pronounced and influenced by temporal, spatial, linguistic, and national factors. Additionally, we investigate the statistical characteristics of Twitter-specific tokens, including emojis, hashtags, and user mentions, revealing a sigmoid pattern in their rank diversity function. These insights contribute to quantifying universal language statistics while also identifying potential sources of variation.

Subjects

LANGUAGE models; UNIVERSAL language; LINGUISTIC complexity; SPANISH language; ENGLISH language

Publication

Entropy, 2024, Vol 26, Issue 9, p734

ISSN

1099-4300

Publication type

Academic Journal

DOI

10.3390/e26090734

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved