Punctuation effects in English and Esperanto texts
M. Ausloos

TL;DR
This study analyzes how punctuation influences sentence length distributions in English and Esperanto texts, revealing different power-law behaviors and suggesting sentences are more indicative of author style than word frequency distributions.
Contribution
It introduces a statistical physics approach to compare punctuation effects and sentence structures in natural and artificial languages, highlighting the robustness of sentence-based analysis.
Findings
Different power-law exponents are observed for punctuation marks.
Sentence definitions significantly affect the power-law exponents.
Minimal differences are found between original and translated texts at the exponent level.
Abstract
A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\it Alice in wonderland} and {\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity ( 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
