In narrative texts punctuation marks obey the same statistics as words

Andrzej Kulig; Jaroslaw Kwapien; Tomasz Stanisz; Stanislaw Drozdz

arXiv:1604.00834·cs.CL·November 3, 2016

In narrative texts punctuation marks obey the same statistics as words

Andrzej Kulig, Jaroslaw Kwapien, Tomasz Stanisz, Stanislaw Drozdz

PDF

TL;DR

This study demonstrates that punctuation marks in language samples exhibit statistical properties similar to common words, suggesting they should be included in linguistic analyses such as Zipfian and network studies.

Contribution

The paper provides evidence that punctuation marks follow Zipfian distributions and behave like frequent words, advocating for their inclusion in statistical language analyses.

Findings

01

Punctuation marks follow Zipfian distributions similar to words.

02

Including punctuation restores power-law behavior in Zipf plots.

03

Punctuation exhibits properties akin to frequent words in network analyses.

Abstract

From a grammar point of view, the role of punctuation marks in a sentence is formally defined and well understood. In semantic analysis punctuation plays also a crucial role as a method of avoiding ambiguity of the meaning. A different situation can be observed in the statistical analyses of language samples, where the decision on whether the punctuation marks should be considered or should be neglected is seen rather as arbitrary and at present it belongs to a researcher's preference. An objective of this work is to shed some light onto this problem by providing us with an answer to the question whether the punctuation marks may be treated as ordinary words and whether they should be included in any analysis of the word co-occurences. We already know from our previous study (S.~Dro\.zd\.z {\it et al.}, Inf. Sci. 331 (2016) 32-44) that full stops that determine the length of sentences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.