SESQA: semi-supervised learning for speech quality assessment
Joan Serr\`a, Jordi Pons, Santiago Pascual

TL;DR
This paper introduces SESQA, a semi-supervised learning framework for speech quality assessment that leverages limited annotations and auxiliary tasks to improve accuracy and generalization.
Contribution
It proposes a novel semi-supervised approach combining annotations, generated data, and multiple optimization criteria with auxiliary tasks for speech quality assessment.
Findings
Reduced error of existing methods by over 36%
Enhanced feature reusability and auxiliary outputs
Demonstrated promising generalization in out-of-sample tests
Abstract
Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks. Our results show that such a semi-supervised approach can cut the error of existing methods by more than 36%, while providing additional benefits in terms of reusable features or auxiliary outputs. Improvement is further corroborated with an out-of-sample test showing promising generalization capabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
