SESQA: semi-supervised learning for speech quality assessment

Joan Serr\`a; Jordi Pons; Santiago Pascual

arXiv:2010.00368·eess.AS·February 9, 2021

SESQA: semi-supervised learning for speech quality assessment

Joan Serr\`a, Jordi Pons, Santiago Pascual

PDF

TL;DR

This paper introduces SESQA, a semi-supervised learning framework for speech quality assessment that leverages limited annotations and auxiliary tasks to improve accuracy and generalization.

Contribution

It proposes a novel semi-supervised approach combining annotations, generated data, and multiple optimization criteria with auxiliary tasks for speech quality assessment.

Findings

01

Reduced error of existing methods by over 36%

02

Enhanced feature reusability and auxiliary outputs

03

Demonstrated promising generalization in out-of-sample tests

Abstract

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks. Our results show that such a semi-supervised approach can cut the error of existing methods by more than 36%, while providing additional benefits in terms of reusable features or auxiliary outputs. Improvement is further corroborated with an out-of-sample test showing promising generalization capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.