In narrative texts punctuation marks obey the same statistics as words
Andrzej Kulig, Jaroslaw Kwapien, Tomasz Stanisz, Stanislaw Drozdz

TL;DR
This study demonstrates that punctuation marks in language samples exhibit statistical properties similar to common words, suggesting they should be included in linguistic analyses such as Zipfian and network studies.
Contribution
The paper provides evidence that punctuation marks follow Zipfian distributions and behave like frequent words, advocating for their inclusion in statistical language analyses.
Findings
Punctuation marks follow Zipfian distributions similar to words.
Including punctuation restores power-law behavior in Zipf plots.
Punctuation exhibits properties akin to frequent words in network analyses.
Abstract
From a grammar point of view, the role of punctuation marks in a sentence is formally defined and well understood. In semantic analysis punctuation plays also a crucial role as a method of avoiding ambiguity of the meaning. A different situation can be observed in the statistical analyses of language samples, where the decision on whether the punctuation marks should be considered or should be neglected is seen rather as arbitrary and at present it belongs to a researcher's preference. An objective of this work is to shed some light onto this problem by providing us with an answer to the question whether the punctuation marks may be treated as ordinary words and whether they should be included in any analysis of the word co-occurences. We already know from our previous study (S.~Dro\.zd\.z {\it et al.}, Inf. Sci. 331 (2016) 32-44) that full stops that determine the length of sentences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
