LLM-Codec: Neural Audio Codec Meets Language Model Objectives

Ho-Lam Chung; Yiming Chen; Hung-yi Lee

arXiv:2604.17852·cs.SD·April 21, 2026

LLM-Codec: Neural Audio Codec Meets Language Model Objectives

Ho-Lam Chung, Yiming Chen, Hung-yi Lee

PDF

1 Models

TL;DR

This paper introduces LLM-Codec, a neural audio codec trained with language model objectives to improve token predictability and semantic alignment, enhancing speech coherence and reducing perplexity.

Contribution

It proposes a novel training method for neural audio codecs that aligns them better with language models without changing their architecture.

Findings

01

Token LMs trained on LLM-Codec reach 61.6% accuracy, a 12.1 point improvement over AUV.

02

Reduces perplexity by 35 on SALMon speech coherence task.

03

Improves speech Mel distance by 5.0% on Codec-SUPERB-tiny.

Abstract

Neural audio codecs are widely used as tokenizers for spoken language models, but they are optimized for waveform reconstruction rather than autoregressive prediction. This mismatch injects acoustically driven uncertainty into the discrete token space and increases language-model perplexity. We propose \ours, which augments codec training with language-model-facing objectives while keeping both codec and LLM architectures unchanged. \ours introduces (i) future token prediction with Medusa-style multi-step heads to encourage multi-step predictability, and (ii) semantic alignment that matches audio and text representations via a memory-bank contrastive loss. A differentiable Gumbel bridge enables end-to-end gradients from these objectives to the codec encoder. On SALMon speech coherence, token LMs trained on \ours reach 61.6% accuracy (+12.1 points over AUV) while reducing perplexity 35.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
voidful/llm-codec
model· 509 dl
509 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.