Acoustic Model Optimization over Multiple Data Sources: Merging and   Valuation

Victor Junqiu Wei; Weicheng Wang; Di Jiang; Conghui Tan; Rongzhong; Lian

arXiv:2410.15620·cs.SD·October 22, 2024

Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation

Victor Junqiu Wei, Weicheng Wang, Di Jiang, Conghui Tan, Rongzhong, Lian

PDF

Open Access

TL;DR

This paper introduces a novel multi-source acoustic model training paradigm for ASR, combining models trained on different datasets using two algorithms, GMA and SOMA, and evaluates data contribution with Shapley Values.

Contribution

It proposes two new algorithms, GMA and SOMA, for merging acoustic models trained on separate data sources, and applies Shapley Values for data contribution assessment.

Findings

01

GMA improves model merging quality but is slow.

02

SOMA maintains accuracy with higher efficiency.

03

Proposed methods outperform state-of-the-art on public data.

Abstract

Due to the rising awareness of privacy protection and the voluminous scale of speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers to train the acoustic model with complete data as before. For example, the data may be owned by different curators, and it is not allowed to share with others. In this paper, we propose a novel paradigm to solve salient problems plaguing the ASR field. In the first stage, multiple acoustic models are trained based upon different subsets of the complete speech data, while in the second phase, two novel algorithms are utilized to generate a high-quality acoustic model based upon those trained on data subsets. We first propose the Genetic Merge Algorithm (GMA), which is a highly specialized algorithm for optimizing acoustic models but suffers from low efficiency. We further propose the SGD-Based Optimizational Merge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research