Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine

Anastasia Kuznetsova; Inseon Jang; Wootaek Lim; Minje Kim

arXiv:2507.12701·cs.SD·August 6, 2025

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine

Anastasia Kuznetsova, Inseon Jang, Wootaek Lim, Minje Kim

PDF

Open Access

TL;DR

This paper presents a novel audio coding method tailored for machine processing that compresses intermediate features of speech/audio models at ultra-low bitrates, maintaining high task performance.

Contribution

It introduces a task-specific, feature-based audio coding approach using residual vector quantization and loss guidance, enabling efficient compression adaptable to various models and tasks.

Findings

01

Achieves less than 200 bps compression with minimal performance loss.

02

Effective on speech recognition and audio classification tasks.

03

Demonstrates adaptability across different models and tasks.

Abstract

Neural audio codecs, leveraging quantization algorithms, have significantly impacted various speech/audio tasks. While high-fidelity reconstruction is paramount for human perception, audio coding for machines (ACoM) prioritizes efficient compression and downstream task performance, disregarding perceptual nuances. This work introduces an efficient ACoM method that can compress and quantize any chosen intermediate feature representation of an already trained speech/audio downstream model. Our approach employs task-specific loss guidance alongside residual vector quantization (RVQ) losses, providing ultra-low bitrates (i.e., less than 200 bps) with a minimal loss of the downstream model performance. The resulting tokenizer is adaptable to various bitrates and model sizes for flexible deployment. Evaluated on automatic speech recognition and audio classification, our method demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies