SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu, Xinyu Yang, Gang Wang

TL;DR
SepALM introduces an innovative audio language model-based framework that corrects and re-synthesizes separated speech in the text domain, significantly improving robustness and accuracy in challenging real-world acoustic environments.
Contribution
It presents a novel end-to-end error correction approach using audio language models for speech separation, overcoming limitations of traditional methods and enhancing adaptability.
Findings
Improves speech separation accuracy in noisy environments
Reduces error accumulation compared to conventional methods
Enhances adaptability to diverse acoustic settings
Abstract
While contemporary speech separation technologies adeptly process lengthy mixed audio waveforms, they are frequently challenged by the intricacies of real-world environments, including noisy and reverberant settings, which can result in artifacts or distortions in the separated speech. To overcome these limitations, we introduce SepALM, a pioneering approach that employs audio language models (ALMs) to rectify and re-synthesize speech within the text domain following preliminary separation. SepALM comprises four core components: a separator, a corrector, a synthesizer, and an aligner. By integrating an ALM-based end-to-end error correction mechanism, we mitigate the risk of error accumulation and circumvent the optimization hurdles typically encountered in conventional methods that amalgamate automatic speech recognition (ASR) with large language models (LLMs). Additionally, we have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
MethodsKnowledge Distillation
