Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs

Dingdong Wang; Junan Li; Mingyu Cui; Dongchao Yang; Xueyuan Chen; Helen Meng

arXiv:2508.17863·cs.CL·August 26, 2025

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs

Dingdong Wang, Junan Li, Mingyu Cui, Dongchao Yang, Xueyuan Chen, Helen Meng

PDF

1 Video

TL;DR

This paper compares discrete tokens and continuous features in SpeechLLMs, revealing that continuous features generally outperform discrete tokens across multiple spoken language understanding tasks, providing insights for future speech processing methods.

Contribution

It offers a comprehensive, fair comparison of discrete and continuous speech representations under identical conditions, highlighting their distinct characteristics and performance differences.

Findings

01

Continuous features outperform discrete tokens in most tasks.

02

Distinct learning patterns observed between the two methods.

03

Insights into robustness and layer-specific behaviors.

Abstract

With the rise of Speech Large Language Models (SpeechLLMs), two dominant approaches have emerged for speech processing: discrete tokens and continuous features. Each approach has demonstrated strong capabilities in audio-related processing tasks. However, the performance gap between these two paradigms has not been thoroughly explored. To address this gap, we present a fair comparison of self-supervised learning (SSL)-based discrete and continuous features under the same experimental settings. We evaluate their performance across six spoken language understanding-related tasks using both small and large-scale LLMs (Qwen1.5-0.5B and Llama3.1-8B). We further conduct in-depth analyses, including efficient comparison, SSL layer analysis, LLM layer analysis, and robustness comparison. Our findings reveal that continuous features generally outperform discrete tokens in various tasks. Each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs· underline