TL;DR
This paper investigates how small amounts of label noise, both systematic and random, significantly affect the performance of convolutional neural networks in fine-grained audio labeling tasks, emphasizing the need for precise annotations.
Contribution
It demonstrates the high sensitivity of CNNs to label misalignments in fine-grained audio tasks and highlights the importance of accurate labeling for reliable model performance.
Findings
Even slight label misalignments cause noticeable performance degradation
CNNs are highly sensitive to both systematic and random label noise
Precise timing in annotations is crucial for accurate audio signal labeling
Abstract
We measure the effect of small amounts of systematic and random label noise caused by slightly misaligned ground truth labels in a fine grained audio signal labeling task. The task we choose to demonstrate these effects on is also known as framewise polyphonic transcription or note quantized multi-f0 estimation, and transforms a monaural audio signal into a sequence of note indicator labels. It will be shown that even slight misalignments have clearly apparent effects, demonstrating a great sensitivity of convolutional neural networks to label noise. The implications are clear: when using convolutional neural networks for fine grained audio signal labeling tasks, great care has to be taken to ensure that the annotations have precise timing, and are free from systematic or random error as much as possible - even small misalignments will have a noticeable impact.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
