TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Yunyang Zeng; Joseph Konan; Shuo Han; David Bick; Muqiao Yang; Anurag; Kumar; Shinji Watanabe; Bhiksha Raj

arXiv:2302.08088·cs.CL·February 17, 2023·1 cites

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag, Kumar, Shinji Watanabe, Bhiksha Raj

PDF

Open Access 2 Repos

TL;DR

This paper introduces TAPLoss, a novel differentiable loss function based on temporal acoustic parameters, to improve perceptual quality in speech enhancement models by optimizing fine-grain speech features.

Contribution

The paper presents TAPLoss, a new auxiliary objective that enhances speech perceptual quality by optimizing multiple low-level acoustic features during speech enhancement.

Findings

01

TAPLoss improves perceptual quality and intelligibility of enhanced speech.

02

Both time-domain and time-frequency domain models benefit from TAPLoss.

03

TAPLoss outperforms prior approaches in the Deep Noise Suppression 2020 Challenge.

Abstract

Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features. Unlike prior work that looks at aggregated acoustic parameters or a few categories of acoustic parameters, our temporal acoustic parameter (TAP) loss enables auxiliary optimization and improvement of many fine-grain speech characteristics in enhancement workflows. We show that adding TAPLoss as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research