Iterative pseudo-forced alignment by acoustic CTC loss for   self-supervised ASR domain adaptation

Fernando L\'opez; Jordi Luque

arXiv:2210.15226·cs.CL·January 18, 2023

Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation

Fernando L\'opez, Jordi Luque

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces an iterative pseudo-forced alignment method using CTC loss for self-supervised domain adaptation in ASR, enabling accurate alignment and adaptation without human annotations.

Contribution

It presents a novel iterative alignment algorithm that refines audio-text alignments using CTC posteriors, improving domain adaptation for end-to-end ASR without manual labels.

Findings

01

Achieves high-quality alignments on broadcast TV and voice datasets.

02

Enables effective domain adaptation and semi-supervised training.

03

No human-revised references needed for alignment and adaptation.

Abstract

High-quality data labeling from specific domains is costly and human time-consuming. In this work, we propose a self-supervised domain adaptation method, based upon an iterative pseudo-forced alignment algorithm. The produced alignments are employed to customize an end-to-end Automatic Speech Recognition (ASR) and iteratively refined. The algorithm is fed with frame-wise character posteriors produced by a seed ASR, trained with out-of-domain data, and optimized throughout a Connectionist Temporal Classification (CTC) loss. The alignments are computed iteratively upon a corpus of broadcast TV. The process is repeated by reducing the quantity of text to be aligned or expanding the alignment window until finding the best possible audio-text alignment. The starting timestamps, or temporal anchors, are produced uniquely based on the confidence score of the last aligned utterance. This score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ferugit/iterative-pseudo-forced-alignment-ctc
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing