AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation

Yulin Sun; Qisheng Xu; Yi Su; Qian Zhu; Yong Dou; Xinwang Liu; Kele Xu

arXiv:2508.15429·cs.SD·August 25, 2025

AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation

Yulin Sun, Qisheng Xu, Yi Su, Qian Zhu, Yong Dou, Xinwang Liu, Kele Xu

PDF

TL;DR

This paper introduces AudioSet-R, a refined version of AudioSet with improved labels achieved through a multi-stage reannotation process using audio-language foundation models, leading to better performance in audio classification tasks.

Contribution

It presents a novel three-stage reannotation framework utilizing cross-modal prompting and foundation models to systematically enhance label quality in AudioSet.

Findings

01

Significant performance improvements across multiple audio classification models.

02

Demonstrated effectiveness and generalizability of the reannotation framework.

03

Created a high-quality, relabeled version of AudioSet-R.

Abstract

AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.