Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models
Amir Ivry

TL;DR
This paper develops a theoretical and practical framework for audio compression that preserves answers in large audio language models, especially for critical query families, by controlling worst-case error with statistical guarantees.
Contribution
It introduces a compressor acceptance-rejection criterion and a sign-off protocol to ensure answer preservation under audio compression for LALMs, with experimental validation.
Findings
The protocol detects hidden family-level damage from compression.
Query-family partitioning influences compression budgets.
Query-conditioned compression can improve answer preservation.
Abstract
Large audio language models (LALMs) are increasingly used to reason over long audio clips, yet deployment often compresses audio before inference to reduce memory and latency. The risk is that compression can leave aggregate accuracy acceptable while sharply degrading answers for a deployment-critical query family. We study answer-preserving audio compression, judging a compressor by the excess answer-error it induces, especially for the worst-affected family. We formulate this theoretically as a compressor acceptance-rejection criterion, derive a practical sign-off protocol that returns compression budgets satisfying worst-family checks with statistical confidence, and evaluate it on five multiple-choice audio question-answering benchmarks with two Qwen-based backbones. The protocol exposes hidden family-level damage, shows that the chosen query-family partition can change the approved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
