Target word activity detector: An approach to obtain ASR word boundaries   without lexicon

Sunit Sivasankaran; Eric Sun; Jinyu Li; Yan Huang; Jing Pan

arXiv:2409.13913·cs.CL·September 24, 2024

Target word activity detector: An approach to obtain ASR word boundaries without lexicon

Sunit Sivasankaran, Eric Sun, Jinyu Li, Yan Huang, Jing Pan

PDF

Open Access

TL;DR

This paper introduces a novel method for estimating word boundaries in end-to-end multilingual ASR models without using lexicons, leveraging word embeddings and a pretrained model to improve scalability and reduce costs.

Contribution

The proposed approach estimates word boundaries without lexicons, using only word alignment info and scalable to multiple languages, addressing limitations of existing methods.

Findings

01

Effective in multilingual settings with five languages

02

Outperforms strong baseline methods

03

Scalable without additional computational costs

Abstract

Obtaining word timestamp information from end-to-end (E2E) ASR models remains challenging due to the lack of explicit time alignment during training. This issue is further complicated in multilingual models. Existing methods, either rely on lexicons or introduce additional tokens, leading to scalability issues and increased computational costs. In this work, we propose a new approach to estimate word boundaries without relying on lexicons. Our method leverages word embeddings from sub-word token units and a pretrained ASR model, requiring only word alignment information during training. Our proposed method can scale-up to any number of languages without incurring any additional cost. We validate our approach using a multilingual ASR model trained on five languages and demonstrate its effectiveness against a strong baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques