AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation
Yulin Sun, Qisheng Xu, Yi Su, Qian Zhu, Yong Dou, Xinwang Liu, Kele Xu

TL;DR
This paper introduces AudioSet-R, a refined version of AudioSet with improved labels achieved through a multi-stage reannotation process using audio-language foundation models, leading to better performance in audio classification tasks.
Contribution
It presents a novel three-stage reannotation framework utilizing cross-modal prompting and foundation models to systematically enhance label quality in AudioSet.
Findings
Significant performance improvements across multiple audio classification models.
Demonstrated effectiveness and generalizability of the reannotation framework.
Created a high-quality, relabeled version of AudioSet-R.
Abstract
AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
