Word frequency-rank relationship in tagged texts

A. Chacoma; D. H. Zanette

arXiv:2102.10992·cs.CL·June 11, 2021

Word frequency-rank relationship in tagged texts

A. Chacoma, D. H. Zanette

PDF

TL;DR

This study investigates how the frequency-rank relationship varies across different grammatical classes in English literary texts, revealing significant differences linked to linguistic features.

Contribution

It introduces an analysis of frequency-rank distributions for grammatical classes, highlighting their distinct patterns and linguistic implications.

Findings

01

Significant differences in frequency-rank relationships among grammatical classes

02

Frequency distributions reflect linguistic features of grammatical roles

03

Statistical analysis supports non-uniform distribution across classes

Abstract

We analyze the frequency-rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency-ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency-rank relationships may reflect linguistic features associated with grammatical function.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.