Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in   Speech Enhancement

Xavier Gitiaux; Aditya Khant; Ebrahim Beyrami; Chandan Reddy; Jayant; Gupchup; Ross Cutler

arXiv:2110.04391·eess.AS·April 5, 2023

Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement

Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan Reddy, Jayant, Gupchup, Ross Cutler

PDF

Open Access 1 Repo

TL;DR

Aura is a privacy-preserving augmentation method that enhances speech enhancement test sets by increasing diversity and difficulty without exposing sensitive customer data, thereby improving model evaluation.

Contribution

Aura introduces a novel, sample-efficient, privacy-preserving augmentation technique for speech enhancement test sets using pre-trained feature extractors and quality metrics.

Findings

01

Aura increases test set difficulty by 7% in DNSMOS P.835 OVLR

02

It boosts diversity by 31% and SRCC by 26%

03

Aura is open-sourced to facilitate further research

Abstract

Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This `ears-off' situation motivates augmenting existing datasets in a privacy-preserving manner. In this paper, we present Aura, a solution to make existing noise suppression test sets more challenging and diverse while being sample efficient. Aura is `ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of Aura, we augment the INTERSPEECH 2021 DNS challenge by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/aura
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsTest · Normalizing Flows · Sliced Iterative Generator