SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification
Giries Abu Ayoub, Morad Tukan, Loay Mualem

TL;DR
SpurAudio is a new benchmark designed to evaluate how few-shot audio classification models rely on contextual cues, revealing their vulnerabilities to spurious correlations and highlighting the importance of context-aware evaluation.
Contribution
The paper introduces SpurAudio, a benchmark that enables controlled assessment of contextual shifts in few-shot audio classification, exposing model vulnerabilities to background correlations.
Findings
State-of-the-art few-shot methods degrade when background cues are disrupted.
Large pretrained models are also vulnerable to context shifts.
Different algorithms show varying sensitivity to spurious correlations.
Abstract
Few-shot classification (FSC) is widely used for learning from limited labeled data, yet most evaluations implicitly assume that target concepts are independent of contextual cues. In real-world settings, however, examples often appear within rich contexts, allowing models to exploit spurious correlations between foreground content and background signals. While such effects have been studied in few-shot image classification, their role in few-shot audio classification remains largely unexplored, and existing audio benchmarks offer limited control over contextual structure. We introduce SpurAudio, a benchmark that leverages the natural separability of foreground events and background environments in audio to enable controlled, multi-level evaluation of contextual shifts across support and query sets. Using this benchmark, we show that many state-of-the-art few-shot methods suffer severe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
