UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement

Haoyin Yan; Chengwei Liu; Shaofei Xue; Xiaotao Liang; Zheng Xue

arXiv:2510.20441·cs.SD·October 24, 2025

UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement

Haoyin Yan, Chengwei Liu, Shaofei Xue, Xiaotao Liang, Zheng Xue

PDF

Open Access 1 Models

TL;DR

UniSE introduces a unified decoder-only autoregressive language model framework that effectively handles various speech enhancement tasks, demonstrating competitive performance across multiple benchmarks.

Contribution

This work is the first to verify the effectiveness of autoregressive language models in unifying different speech enhancement sub-tasks.

Findings

01

Achieves competitive results on several benchmarks.

02

Demonstrates the capacity of LMs to unify multiple SE tasks.

03

Shows compatibility of distinct learning patterns in a single framework.

Abstract

The development of neural audio codecs (NACs) has largely promoted applications of language models (LMs) to speech processing and understanding. However, there lacks the verification on the effectiveness of autoregressive (AR) LMbased models in unifying different sub-tasks of speech enhancement (SE). In this work, we propose UniSE, a unified decoder-only LM-based framework to handle different SE tasks including speech restoration, target speaker extraction and speech separation. It takes input speech features as conditions and generates discrete tokens of the target speech using AR modeling, which facilitates a compatibility between distinct learning patterns of multiple tasks. Experiments on several benchmarks indicate the proposed UniSE can achieve competitive performance compared to discriminative and generative baselines, showing the capacity of LMs in unifying SE tasks. The demo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
QuarkAudio/QuarkAudio-UniSE
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis