AlignNet: Learning dataset score alignment functions to enable better   training of speech quality estimators

Jaden Pieper; Stephen D. Voran

arXiv:2406.10205·eess.AS·September 27, 2024·Interspeech

AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators

Jaden Pieper, Stephen D. Voran

PDF

Open Access 1 Repo

TL;DR

AlignNet introduces a novel approach combining dataset score alignment with multi-dataset finetuning to enhance training of no-reference speech quality estimators, enabling better generalization across diverse datasets.

Contribution

The paper proposes AlignNet and MDF, two methods that improve no-reference speech quality estimation by addressing dataset misalignments and enabling effective multi-dataset training.

Findings

01

AlignNet with MDF outperforms existing solutions on multiple datasets.

02

The methods effectively remove score misalignments, improving estimator training.

03

Results show improved accuracy and robustness of speech quality estimators.

Abstract

We develop two complementary advances for training no-reference (NR) speech quality estimators with independent datasets. Multi-dataset finetuning (MDF) pretrains an NR estimator on a single dataset and then finetunes it on multiple datasets at once, including the dataset used for pretraining. AlignNet uses an AudioNet to generate intermediate score estimates before using the Aligner to map intermediate estimates to the appropriate score range. AlignNet is agnostic to the choice of AudioNet so any successful NR speech quality estimator can benefit from its Aligner. The methods can be used in tandem, and we use two studies to show that they improve on current solutions: one study uses nine smaller datasets and the other uses four larger datasets. AlignNet with MDF improves on other solutions because it efficiently and effectively removes misalignments that impair the learning process,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ntia/alignnet
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing