TL;DR
This paper introduces a structured prediction machine learning model that automatically measures vowel duration from acoustic signals, outperforming traditional HMM-based forced aligners and reducing the need for manual annotation in phonetic studies.
Contribution
The paper presents a novel structured prediction approach for automatic vowel duration measurement that does not require phonetic transcription, improving accuracy over existing methods.
Findings
Model outperforms HMM-based forced aligners in accuracy
Requires no phonetic or orthographic transcription
Demonstrates scalability for phonetic research
Abstract
A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel. The model is based on the structured prediction framework. The input signal and a hypothesized set of a vowel's onset and offset are mapped to an abstract vector space by a set of acoustic feature functions. The learning algorithm is trained in this space to minimize the difference in expectations between predicted and manually-measured vowel durations. The trained model can then automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
