Speech Foundation Models and Crowdsourcing for Efficient, High-Quality   Data Collection

Beomseok Lee; Marco Gaido; Ioan Calapodescu; Laurent Besacier; Matteo; Negri

arXiv:2412.11978·cs.CL·December 17, 2024

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection

Beomseok Lee, Marco Gaido, Ioan Calapodescu, Laurent Besacier, Matteo, Negri

PDF

Open Access

TL;DR

This paper explores using Speech Foundation Models to automate speech data validation in crowdsourcing, significantly reducing costs while maintaining data quality across multiple languages.

Contribution

It introduces a novel application of SFMs for validation, demonstrating cost savings and quality preservation in speech data collection.

Findings

01

SFM-based validation reduces human validation needs by over 40%.

02

Cost savings are achieved without degrading data quality.

03

Experiments conducted on French, German, and Korean datasets.

Abstract

While crowdsourcing is an established solution for facilitating and scaling the collection of speech data, the involvement of non-experts necessitates protocols to ensure final data quality. To reduce the costs of these essential controls, this paper investigates the use of Speech Foundation Models (SFMs) to automate the validation process, examining for the first time the cost/quality trade-off in data acquisition. Experiments conducted on French, German, and Korean data demonstrate that SFM-based validation has the potential to reduce reliance on human validation, resulting in an estimated cost saving of over 40.0% without degrading final data quality. These findings open new opportunities for more efficient, cost-effective, and scalable speech data acquisition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Speech and Audio Processing