CNN Encoding of Acoustic Parameters for Prominence Detection
Kamini Sabu, Mithilesh Vaidya, Preeti Rao

TL;DR
This paper explores deep learning methods, including RNNs and end-to-end feature extraction, for detecting prominent words in children's reading aloud, aiming to improve speaker-independent prominence detection accuracy.
Contribution
It introduces an RNN-based sequence classifier and end-to-end deep learning for acoustic feature extraction in prominence detection, advancing beyond previous random forest approaches.
Findings
RNN classifier outperforms traditional models in prominence detection.
End-to-end deep learning effectively captures acoustic contours for word prominence.
Performance varies with feature types and learning architectures, providing insights into optimal configurations.
Abstract
Expressive reading, considered the defining attribute of oral reading fluency, comprises the prosodic realization of phrasing and prominence. In the context of evaluating oral reading, it helps to establish the speaker's comprehension of the text. We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words using acoustic-prosodic and lexico-syntactic features. A previous well-tuned random forest ensemble predictor is replaced by an RNN sequence classifier to exploit potential context dependency across the longer utterance. Further, deep learning is applied to obtain word-level features from low-level acoustic contours of fundamental frequency, intensity and spectral shape in an end-to-end fashion. Performance comparisons are presented across the different feature types and across different feature learning architectures for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research
