TL;DR
This paper introduces a class-wise supervision unreliability framework (CSU) for robust audio tagging, effectively handling various unreliable supervision sources in weakly labeled datasets.
Contribution
The paper proposes CSU, a novel class-wise unreliability modeling approach that improves robustness in audio tagging without altering model architecture.
Findings
CSU enhances robustness across different architectures.
It effectively mitigates class-dependent supervision bias.
Experiments on AudioSet and benchmarks show improved performance.
Abstract
Weakly labeled datasets such as AudioSet have driven recent progress in audio tagging. However, annotation quality varies across sound classes. Labels may be incomplete, ambiguous, or unreliable, which introduces class-dependent supervision bias during optimisation. The issue becomes harder as real and generated audio are increasingly mixed in training, and generated samples do not always match their intended semantic labels. Prior work mainly addressed unreliable supervision from missing-positive labels, while this paper targets three other sources of unreliable supervision: spurious additions, misassignments between similar classes, and weakened label evidence. These effects introduce class-dependent optimisation bias that is not explicitly modeled by most existing methods. To bridge this gap, the paper proposes a Class-wise Supervision Unreliability (CSU) framework that controls…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
