Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation
Yonghyun Kim, Alexander Lerch

TL;DR
This paper investigates how adding white noise at different SNR levels affects the performance of state-of-the-art Automatic Piano Transcription models, aiming to improve robustness across noisy environments.
Contribution
It explores noise injection strategies for data augmentation in training APT models, a relatively underexplored area in improving transcription robustness.
Findings
Noise augmentation improves model robustness in noisy conditions
Performance varies with different SNR levels during training
Training on noise-augmented data maintains accuracy across acoustic environments
Abstract
Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained on noise-augmented data. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
