Robust Lossy Audio Compression Identification
Hendrik Vincent Koops, Gianluca Micchi, Elio Quinton

TL;DR
This paper investigates the robustness of lossy audio compression identification models, revealing their vulnerability to unseen codec parameters and proposing a new training strategy to improve generalization.
Contribution
It demonstrates the lack of robustness in existing models and introduces a masking-based training method to enhance model generalization across unseen codec settings.
Findings
Models are sensitive to codec parameter variations.
Masking input spectrograms improves robustness.
Proposed method significantly increases generalization capability.
Abstract
Previous research contributions on blind lossy compression identification report near perfect performance metrics on their test set, across a variety of codecs and bit rates. However, we show that such results can be deceptive and may not accurately represent true ability of the system to tackle the task at hand. In this article, we present an investigation into the robustness and generalisation capability of a lossy audio identification model. Our contributions are as follows. (1) We show the lack of robustness to codec parameter variations of a model equivalent to prior art. In particular, when naively training a lossy compression detection model on a dataset of music recordings processed with a range of codecs and their lossless counterparts, we obtain near perfect performance metrics on the held-out test set, but severely degraded performance on lossy tracks produced with codec…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing
