A Simple Feature Method for Prosody Rhythm Comparison
Mariana Juli\~ao, Alberto Abad, Helena Moniz

TL;DR
This paper introduces an unsupervised, content-independent method called Peak Embedding for assessing prosody rhythm, demonstrating its effectiveness through clustering metrics on speech data.
Contribution
It proposes a novel fixed-length representation for rhythm comparison that simplifies and improves upon traditional, cumbersome measurement techniques.
Findings
Achieved 0.444 Silhouette Coefficient with PE and Loudness.
Attained 0.979 Global Separability Index with PE, Pitch, and Loudness.
Demonstrated effective clustering of speech units based on rhythm features.
Abstract
Of all components of Prosody, Rhythm has been regarded as the hardest to address, as it is utterly linked to Pitch and Intensity. Nevertheless, Rhythm is a very good indicator of a speaker's fluency in a foreign language or even of some diseases. Canonical ways to measure Rhythm, such as or , involve a cumbersome process of segment alignment, often leading to modest and questionable results. Perceptively, however, rhythm does not sound as difficult, as humans can grasp it even when the text is not fully intelligible. In this work, we develop an empirical and unsupervised method of rhythm assessment, which does not rely on the content. We have created a fixed-length representation of each utterance, Peak Embedding (PE), which codifies the proportional distance between peaks of the chosen Low-Level Descriptors. Clustering pairs of small sentence-like units, we have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Natural Language Processing Techniques · Speech Recognition and Synthesis
