DASB - Discrete Audio and Speech Benchmark

Pooneh Mousavi; Jarod Duret; Darius Petermann; Artem Ploujnikov; Luca Della Libera; Anastasia Kuznetsova; Cem Subakan; Mirco Ravanelli

arXiv:2406.14294·cs.SD·April 22, 2026·1 cites

DASB - Discrete Audio and Speech Benchmark

Pooneh Mousavi, Jarod Duret, Darius Petermann, Artem Ploujnikov, Luca Della Libera, Anastasia Kuznetsova, Cem Subakan, Mirco Ravanelli

PDF

1 Repo 7 Models

TL;DR

DASB introduces a comprehensive benchmarking framework for discrete audio tokens across multiple domains, revealing their current limitations and guiding future improvements.

Contribution

The paper presents DASB, a standardized benchmark for evaluating discrete audio tokens, addressing inconsistencies and providing insights into their robustness and performance.

Findings

01

Discrete representations are less robust than continuous ones.

02

Semantic tokens outperform acoustic tokens but still lag behind continuous features.

03

Careful tuning of model factors is essential for optimal performance.

Abstract

Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling multimodal language models that can both generate and understand audio. However, preserving key information such as phonetic content, speaker identity, and paralinguistic cues remains a major challenge. Identifying the optimal tokenizer and configuration is further complicated by inconsistent evaluation settings across existing studies. To address this, we introduce the Discrete Audio and Speech Benchmark (DASB), a comprehensive framework for benchmarking discrete audio tokens across speech, general audio, and music domains on a range of discriminative and generative tasks. Our results show that discrete representations are less robust than continuous ones and require careful tuning of factors such as model architecture, data size, learning rate, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://poonehmousavi.github.io/DASB-website
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.