AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators
Jaden Pieper, Stephen D. Voran

TL;DR
AlignNet introduces a novel approach combining dataset score alignment with multi-dataset finetuning to enhance training of no-reference speech quality estimators, enabling better generalization across diverse datasets.
Contribution
The paper proposes AlignNet and MDF, two methods that improve no-reference speech quality estimation by addressing dataset misalignments and enabling effective multi-dataset training.
Findings
AlignNet with MDF outperforms existing solutions on multiple datasets.
The methods effectively remove score misalignments, improving estimator training.
Results show improved accuracy and robustness of speech quality estimators.
Abstract
We develop two complementary advances for training no-reference (NR) speech quality estimators with independent datasets. Multi-dataset finetuning (MDF) pretrains an NR estimator on a single dataset and then finetunes it on multiple datasets at once, including the dataset used for pretraining. AlignNet uses an AudioNet to generate intermediate score estimates before using the Aligner to map intermediate estimates to the appropriate score range. AlignNet is agnostic to the choice of AudioNet so any successful NR speech quality estimator can benefit from its Aligner. The methods can be used in tandem, and we use two studies to show that they improve on current solutions: one study uses nine smaller datasets and the other uses four larger datasets. AlignNet with MDF improves on other solutions because it efficiently and effectively removes misalignments that impair the learning process,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
