Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance
Manuel Milling, Shuo Liu, Andreas Triantafyllopoulos, Ilhan Aslan, and, Bj\"orn W. Schuller

TL;DR
This paper introduces an iterative training paradigm using sample importance to enhance neural network audio models' noise robustness across various applications like ASR and ASC, especially in noisy environments.
Contribution
It proposes a joint end-to-end training method that optimizes audio enhancement and application models using sample importance, improving noise robustness.
Findings
Significant improvement in noise robustness at low SNRs.
Effective across diverse audio applications including speech and non-speech tasks.
Enhances performance in everyday noisy environments.
Abstract
Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAutoencoders
