Ontology-aware Learning and Evaluation for Audio Tagging
Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D., Plumbley

TL;DR
This paper introduces ontology-aware evaluation and training methods for audio tagging that incorporate sound class relationships, leading to more accurate and human-aligned performance assessments.
Contribution
It proposes OmAP, an ontology-aware metric, and OBCE, a loss function reweighted by ontology distance, enhancing evaluation robustness and model training for audio tagging.
Findings
OmAP aligns better with human perception than mAP.
OBCE improves mAP and OmAP scores.
Ontology information enhances audio tagging performance.
Abstract
This study defines a new evaluation metric for audio tagging tasks to overcome the limitation of the conventional mean average precision (mAP) metric, which treats different kinds of sound as independent classes without considering their relations. Also, due to the ambiguities in sound labeling, the labels in the training and evaluation set are not guaranteed to be accurate and exhaustive, which poses challenges for robust evaluation with mAP. The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation. Specifically, we reweight the false positive events in the model prediction based on the ontology graph distance to the target classes. The OmAP measure also provides more insights into model performance by evaluations with different coarse-grained levels in the ontology graph. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
MethodsOntology
