VAIS ASR: Building a conversational speech recognition system using   language model combination

Quang Minh Nguyen; Thai Binh Nguyen; Ngoc Phuong Pham; The Loc Nguyen

arXiv:1910.05603·cs.CL·October 15, 2019·1 cites

VAIS ASR: Building a conversational speech recognition system using language model combination

Quang Minh Nguyen, Thai Binh Nguyen, Ngoc Phuong Pham, The Loc Nguyen

PDF

Open Access

TL;DR

This paper presents VAIS ASR, a conversational speech recognition system that combines language models to improve performance in noisy and conversational environments, achieving competitive WER results.

Contribution

It introduces a novel language model combination approach to enhance conversational ASR using limited conversational data and large text corpora.

Findings

01

Achieved 4.85% WER on VLSP 2018 dataset

02

Achieved 15.09% WER on VLSP 2019 dataset

03

Demonstrated effectiveness of language model combination in noisy, conversational settings

Abstract

Automatic Speech Recognition (ASR) systems have been evolving quickly and reaching human parity in certain cases. The systems usually perform pretty well on reading style and clean speech, however, most of the available systems suffer from situation where the speaking style is conversation and in noisy environments. It is not straight-forward to tackle such problems due to difficulties in data collection for both speech and text. In this paper, we attempt to mitigate the problems using language models combination techniques that allows us to utilize both large amount of writing style text and small number of conversation text data. Evaluation on the VLSP 2019 ASR challenges showed that our system achieved 4.85% WER on the VLSP 2018 and 15.09% WER on the VLSP 2019 data sets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing