Do learned speech symbols follow Zipf's law?
Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, and, Hiroshi Saruwatari

TL;DR
This paper examines whether learned speech symbols generated by deep learning models follow Zipf's law, similar to natural language symbols, to enhance statistical analysis in spoken language processing.
Contribution
It is the first study to analyze the frequency distribution of learned speech symbols in relation to Zipf's law, bridging a gap between natural language and data-driven speech representations.
Findings
Learned speech symbols approximately follow Zipf's law
Results suggest similarities between natural language and learned speech symbol distributions
Provides a foundation for statistical analysis of speech representations
Abstract
In this study, we investigate whether speech symbols, learned through deep learning, follow Zipf's law, akin to natural language symbols. Zipf's law is an empirical law that delineates the frequency distribution of words, forming fundamentals for statistical analysis in natural language processing. Natural language symbols, which are invented by humans to symbolize speech content, are recognized to comply with this law. On the other hand, recent breakthroughs in spoken language processing have given rise to the development of learned speech symbols; these are data-driven symbolizations of speech content. Our objective is to ascertain whether these data-driven speech symbols follow Zipf's law, as the same as natural language symbols. Through our investigation, we aim to forge new ways for the statistical analysis of spoken language processing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
