A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks   with Large Language Models

Dingdong Wang; Mingyu Cui; Dongchao Yang; Xueyuan Chen; and Helen Meng

arXiv:2411.08742·cs.CL·November 14, 2024

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

Dingdong Wang, Mingyu Cui, Dongchao Yang, Xueyuan Chen, and Helen Meng

PDF

Open Access

TL;DR

This study compares discrete and continuous speech tokens in large language models, revealing continuous features generally outperform discrete ones in semantic tasks and identifying key limitations of discrete tokens.

Contribution

It provides a comprehensive comparison between discrete and continuous speech features in LLMs and analyzes reasons for the underperformance of discrete tokens.

Findings

01

Continuous features outperform discrete tokens in semantic tasks.

02

Discrete tokens have limitations like limited granularity and inefficient information retention.

03

Analysis offers insights for improving discrete speech tokens.

Abstract

With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly. Compared to most studies that focus on continuous speech features, although discrete-token based LLMs have shown promising results on certain tasks, the performance gap between these two paradigms is rarely explored. In this paper, we present a fair and thorough comparison between discrete and continuous features across a variety of semantic-related tasks using a light-weight LLM (Qwen1.5-0.5B). Our findings reveal that continuous features generally outperform discrete tokens, particularly in tasks requiring fine-grained semantic understanding. Moreover, this study goes beyond surface-level comparison by identifying key factors behind the under-performance of discrete tokens, such as limited token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsFocus