AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning

Yiwen Shao; Wei Liu; Jiahong Li; Tianzi Wang; Kun Wei; Meng Yu; Dong Yu

arXiv:2601.06086·cs.CL·January 13, 2026

AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning

Yiwen Shao, Wei Liu, Jiahong Li, Tianzi Wang, Kun Wei, Meng Yu, Dong Yu

PDF

Open Access 1 Models

TL;DR

This paper introduces AZeroS, a speech-LLM trained with a novel Self-Generated Instruction-Free Tuning paradigm, enabling better generalization to unseen tasks without task-specific data collection.

Contribution

The paper proposes SIFT, a new training paradigm for speech-LLMs that eliminates the need for task-specific data, and introduces AZeroS, a model leveraging this paradigm with minimal training cost.

Findings

01

AZeroS achieves state-of-the-art results on multiple benchmarks.

02

SIFT improves generalization to unseen speech tasks.

03

Minimal training cost with high performance.

Abstract

Extending large language models (LLMs) to the speech domain has recently gained significant attention. A typical approach connects a pretrained LLM with an audio encoder through a projection module and trains the resulting model on large-scale, task-specific instruction-tuning datasets. However, curating such instruction-tuning data for specific requirements is time-consuming, and models trained in this manner often generalize poorly to unseen tasks. In this work, we first formulate that the strongest generalization of a speech-LLM is achieved when it is trained with Self-Generated Instruction-Free Tuning (SIFT), in which supervision signals are generated by a frozen LLM using textual representations of speech as input. Our proposed SIFT paradigm eliminates the need for collecting task-specific question-answer pairs and yields the theoretically best generalization to unseen tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
AudenAI/azeros
model· 9 dl· ♡ 2
9 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis