Characterizing Ranked Chinese Syllable-to-Character Mapping Spectrum: A Bridge Between the Spoken and Written Chinese Language
Wentian Li

TL;DR
This study analyzes the ranked syllable-to-character mapping spectrum in Chinese, finding that a two-piece logarithmic function best fits the data, providing insights into the spoken and written language relationship.
Contribution
It introduces a detailed analysis of the toned syllable-to-character spectrum using advanced fitting functions, distinguishing it from prior analyses that did not consider tonal distinctions.
Findings
Two-piece logarithmic function fits the data best
Beta rank function is a close second
Statistical testing shows the two functions are practically equivalent
Abstract
One important aspect of the relationship between spoken and written Chinese is the ranked syllable-to-character mapping spectrum, which is the ranked list of syllables by the number of characters that map to the syllable. Previously, this spectrum is analyzed for more than 400 syllables without distinguishing the four intonations. In the current study, the spectrum with 1280 toned syllables is analyzed by logarithmic function, Beta rank function, and piecewise logarithmic function. Out of the three fitting functions, the two-piece logarithmic function fits the data the best, both by the smallest sum of squared errors (SSE) and by the lowest Akaike information criterion (AIC) value. The Beta rank function is the close second. By sampling from a Poisson distribution whose parameter value is chosen from the observed data, we empirically estimate the -value for testing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
