Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving   Electrolaryngeal Speech Recognition

Lester Phillip Violeta; Ding Ma; Wen-Chin Huang; Tomoki Toda

arXiv:2211.01079·cs.SD·May 31, 2023

Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition

Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces an intermediate fine-tuning method using imperfect synthetic speech to bridge the domain gap in electrolaryngeal speech recognition, significantly improving recognition accuracy.

Contribution

It proposes a novel intermediate fine-tuning step with synthetic speech to enhance ASR performance for electrolaryngeal speakers, addressing domain shift issues.

Findings

01

Achieved 6.1% improvement over baseline without synthetic data

02

Intermediate fine-tuning helps learn high-level features rather than low-level details

03

Effective despite the imperfect nature of synthetic speech

Abstract

Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. Despite the imperfect synthetic data, we show the effectiveness of this on electrolaryngeal speech datasets, with improvements of 6.1% over the baseline that did not use imperfect synthetic speech. Results show how the intermediate fine-tuning stage focuses on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research