Challenges in Automated Processing of Speech from Child Wearables: The Case of Voice Type Classifier
Tarek Kunze, Marianne M\'etais, Hadrien Titeux, Lucas Elbert, Joseph Coffey, Emmanuel Dupoux, Alejandrina Cristia, Marvin Lavechin

TL;DR
This paper examines the challenges in automatically classifying voice types from children's speech in wearable device recordings, emphasizing data relevance and collection over model improvements.
Contribution
It highlights the limited impact of model enhancements and underscores the importance of data relevance, quantity, and sharing permissions for progress.
Findings
Model improvements yield marginal gains.
Data relevance and quantity are crucial.
Sharing permissions facilitate progress.
Abstract
Recordings gathered with child-worn devices promised to revolutionize both fundamental and applied speech sciences by allowing the effortless capture of children's naturalistic speech environment and language production. This promise hinges on speech technologies that can transform the sheer mounds of data thus collected into usable information. This paper demonstrates several obstacles blocking progress by summarizing three years' worth of experiments aimed at improving one fundamental task: Voice Type Classification. Our experiments suggest that improvements in representation features, architecture, and parameter search contribute to only marginal gains in performance. More progress is made by focusing on data relevance and quantity, which highlights the importance of collecting data with appropriate permissions to allow sharing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
