LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data

Wen Ding; Fan Qian

arXiv:2506.04586·cs.CL·March 16, 2026

LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data

Wen Ding, Fan Qian

PDF

TL;DR

LESS leverages large language models to enhance semi-supervised learning for speech foundational models, effectively improving accuracy on in-the-wild data across multiple languages and tasks.

Contribution

The paper introduces LESS, a novel framework that uses LLMs to correct and filter pseudo-labels in semi-supervised speech learning, addressing challenges of complex real-world data.

Findings

01

3.8% WER reduction on WenetSpeech

02

BLEU score increase of 0.8 and 0.7 on testsets

03

Effective across diverse languages and tasks

Abstract

Although state-of-the-art Speech Foundational Models can produce high-quality text pseudo-labels, applying Semi-Supervised Learning (SSL) for in-the-wild real-world data remains challenging due to its richer and more complex acoustics compared to curated datasets. To address the challenges, we introduce LESS (Large Language Model Enhanced Semi-supervised Learning), a versatile framework that uses Large Language Models (LLMs) to correct pseudo-labels generated on in-the-wild data. In the LESS framework, pseudo-labeled text from Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST) of the unsupervised data is refined by an LLM, and further improved by a data filtering strategy. Across Mandarin ASR and Spanish-to-English AST evaluations, LESS delivers consistent gains, with an absolute Word Error Rate reduction of 3.8% on WenetSpeech, and BLEU score increase of 0.8 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.