Universal versus system-specific features of punctuation usage patterns in~major Western~languages
Tomasz Stanisz, Stanislaw Drozdz, Jaroslaw Kwapien

TL;DR
This study analyzes punctuation patterns across seven Western languages, revealing universal statistical features characterized by Weibull distribution parameters, while also highlighting language-specific differences in punctuation usage and flexibility.
Contribution
It demonstrates that punctuation interval distributions are universally modeled by two Weibull parameters, which are language-specific and can reflect translation effects.
Findings
Punctuation intervals follow a Weibull distribution across languages.
English exhibits the least constrained punctuation pattern.
Language-specific Weibull parameters correlate with punctuation flexibility.
Abstract
The celebrated proverb that "speech is silver, silence is golden" has a long multinational history and multiple specific meanings. In written texts punctuation can in fact be considered one of its manifestations. Indeed, the virtue of effectively speaking and writing involves - often decisively - the capacity to apply the properly placed breaks. In the present study, based on a large corpus of world-famous and representative literary texts in seven major Western languages, it is shown that the distribution of intervals between consecutive punctuation marks in almost all texts can universally be characterised by only two parameters of the discrete Weibull distribution which can be given an intuitive interpretation in terms of the so-called hazard function. The values of these two parameters tend to be language-specific, however, and even appear to navigate translations. The properties of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling
