Alternating Approach-Putt Models for Multi-Stage Speech Enhancement
Iksoon Jeong, Kyung-Joong Kim, Kang-Hun Ahn

TL;DR
This paper introduces PuttNet, a neural post-processing model that, when alternated with a speech enhancement network, improves speech quality by reducing artifacts, as shown by multiple quality and intelligibility metrics.
Contribution
The paper proposes an innovative alternating approach using PuttNet to mitigate artifacts in speech enhancement, enhancing audio quality over existing single-model methods.
Findings
Improved perceptual quality scores (PESQ)
Enhanced speech intelligibility (STOI)
Reduced background noise intrusiveness (CBAK)
Abstract
Speech enhancement using artificial neural networks aims to remove noise from noisy speech signals while preserving the speech content. However, speech enhancement networks often introduce distortions to the speech signal, referred to as artifacts, which can degrade audio quality. In this work, we propose a post-processing neural network designed to mitigate artifacts introduced by speech enhancement models. Inspired by the analogy of making a `Putt' after an `Approach' in golf, we name our model PuttNet. We demonstrate that alternating between a speech enhancement model and the proposed Putt model leads to improved speech quality, as measured by perceptual quality scores (PESQ), objective intelligibility (STOI), and background noise intrusiveness (CBAK) scores. Furthermore, we illustrate with graphical analysis why this alternating Approach outperforms repeated application of either…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
