Eureka-Audio: Triggering Audio Intelligence in Compact Language Models

Dan Zhang; Yishu Lei; Jing Hu; Shuwei He; Songhe Deng; Xianlong Luo; Danxiang Zhu; Shikun Feng; Rui Liu; Jingzhou He; Yu Sun; Hua Wu; Haifeng Wang

arXiv:2602.13954·cs.SD·February 17, 2026

Eureka-Audio: Triggering Audio Intelligence in Compact Language Models

Dan Zhang, Yishu Lei, Jing Hu, Shuwei He, Songhe Deng, Xianlong Luo, Danxiang Zhu, Shikun Feng, Rui Liu, Jingzhou He, Yu Sun, Hua Wu, Haifeng Wang

PDF

Open Access 1 Models

TL;DR

Eureka-Audio is a compact, high-performance audio language model with only 1.7B parameters that rivals much larger models across various audio understanding tasks by leveraging a unified architecture and innovative data synthesis techniques.

Contribution

The paper introduces Eureka-Audio, a novel lightweight audio language model with a unique architecture and a data synthesis pipeline, achieving competitive performance with significantly fewer parameters.

Findings

01

Matches or surpasses larger models on multiple benchmarks.

02

Demonstrates strong performance in ASR, audio understanding, and captioning.

03

Balances computational efficiency with high accuracy.

Abstract

We present Eureka-Audio, a compact yet high-performance audio language model that achieves competitive performance against models that are 4 to 18 times larger across a broad range of audio understanding benchmarks. Despite containing only 1.7B parameters, Eureka-Audio demonstrates strong performance on automatic speech recognition (ASR), audio understanding, and dense audio captioning, matching or surpassing multiple 7B to 30B audio and omni-modal baselines. The model adopts a unified end-to-end architecture composed of a lightweight language backbone, a Whisper-based audio encoder, and a sparsely activated Mixture-of-Experts (MoE) adapter that explicitly accounts for audio heterogeneity and alleviates cross-modal optimization conflicts under limited capacity. To further enhance paralinguistic reasoning, we introduce DataFlux, a closed loop audio instruction data synthesis and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
cslys1999/Eureka-Audio-Instruct
model· 193 dl· ♡ 6
193 dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing