Towards Robust Transcription: Exploring Noise Injection Strategies for   Training Data Augmentation

Yonghyun Kim; Alexander Lerch

arXiv:2410.14122·cs.SD·October 21, 2024

Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation

Yonghyun Kim, Alexander Lerch

PDF

Open Access 1 Repo

TL;DR

This paper investigates how adding white noise at different SNR levels affects the performance of state-of-the-art Automatic Piano Transcription models, aiming to improve robustness across noisy environments.

Contribution

It explores noise injection strategies for data augmentation in training APT models, a relatively underexplored area in improving transcription robustness.

Findings

01

Noise augmentation improves model robustness in noisy conditions

02

Performance varies with different SNR levels during training

03

Training on noise-augmented data maintains accuracy across acoustic environments

Abstract

Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained on noise-augmented data. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yonghyunk1m/TowardsRobustTranscription
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques