Toward a Sparse and Interpretable Audio Codec

John Vinyard

arXiv:2505.05654·cs.SD·May 12, 2025

Toward a Sparse and Interpretable Audio Codec

John Vinyard

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel audio encoding method that represents sound as sparse events with physical context, aiming for interpretability and efficiency over traditional block-based codecs.

Contribution

It introduces a proof-of-concept encoder that models audio as sparse events using physics-inspired assumptions, enhancing interpretability and sparsity.

Findings

01

Demonstrates a sparse, event-based audio representation

02

Uses physics-based assumptions to model audio features

03

Encourages interpretability and efficiency in audio coding

Abstract

Most widely-used modern audio codecs, such as Ogg Vorbis and MP3, as well as more recent "neural" codecs like Meta's Encodec or the Descript Audio Codec are based on block-coding; audio is divided into overlapping, fixed-size "frames" which are then compressed. While they often yield excellent reproductions and can be used for downstream tasks such as text-to-audio, they do not produce an intuitive, directly-interpretable representation. In this work, we introduce a proof-of-concept audio encoder that represents audio as a sparse set of events and their times-of-occurrence. Rudimentary physics-based assumptions are used to model attack and the physical resonance of both the instrument being played and the room in which a performance occurs, hopefully encouraging a sparse, parsimonious, and easy-to-interpret representation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JohnVinyard/matching-pursuit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Physical Unclonable Functions (PUFs) and Hardware Security