Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

Phillip Long; Zachary Novack; Chris Donahue

arXiv:2603.08683·cs.SD·March 10, 2026

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

Phillip Long, Zachary Novack, Chris Donahue

PDF

Open Access

TL;DR

This paper evaluates language models for lossless audio compression across various domains and bit depths, introducing Trilobyte for efficient 24-bit audio tokenization, and compares performance with existing codecs.

Contribution

It introduces Trilobyte, a byte-level tokenization schema, enabling tractable 24-bit lossless compression with language models, and benchmarks their performance on full-fidelity audio.

Findings

01

LMs outperform FLAC at 8-bit and 16-bit

02

Trilobyte enables 24-bit lossless compression

03

Compression gains diminish beyond 8-bit

Abstract

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O (2^{b})$ to $O (1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing