NeuFA: Neural Network Based End-to-End Forced Alignment with   Bidirectional Attention Mechanism

Jingbei Li; Yi Meng; Zhiyong Wu; Helen Meng; Qiao Tian; Yuping Wang,; Yuxuan Wang

arXiv:2203.16838·cs.SD·April 1, 2022·1 cites

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang,, Yuxuan Wang

PDF

Open Access 1 Repo 2 Models

TL;DR

NeuFA introduces a neural network end-to-end forced aligner utilizing a novel bidirectional attention mechanism, effectively capturing long-term contextual information and improving alignment accuracy over traditional HMM-based models.

Contribution

The paper presents NeuFA, a unified neural framework with bidirectional attention for end-to-end forced alignment, integrating ASR and TTS tasks for enhanced performance.

Findings

01

Reduced mean absolute error at word level from 25.8 ms to 23.7 ms.

02

Reduced mean absolute error at phoneme level from 17.0 ms to 15.7 ms.

03

Demonstrated superior performance over HMM-based models.

Abstract

Although deep learning and end-to-end models have been widely used and shown superiority in automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, state-of-the-art forced alignment (FA) models are still based on hidden Markov model (HMM). HMM has limited view of contextual information and is developed with long pipelines, leading to error accumulation and unsatisfactory performance. Inspired by the capability of attention mechanism in capturing long term contextual information and learning alignments in ASR and TTS, we propose a neural network based end-to-end forced aligner called NeuFA, in which a novel bidirectional attention mechanism plays an essential role. NeuFA integrates the alignment learning of both ASR and TTS tasks in a unified framework by learning bidirectional alignment information from a shared attention matrix in the proposed bidirectional attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thuhcsi/neufa
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques