ASR-Generated Text for Language Model Pre-training Applied to Speech   Tasks

Valentin Pelloin; Franck Dary; Nicolas Herve; Benoit Favre; Nathalie; Camelin; Antoine Laurent; Laurent Besacier

arXiv:2207.01893·cs.CL·July 6, 2022

ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

Valentin Pelloin, Franck Dary, Nicolas Herve, Benoit Favre, Nathalie, Camelin, Antoine Laurent, Laurent Besacier

PDF

Open Access

TL;DR

This paper demonstrates that large-scale ASR-generated text from diverse speech data can be effectively used to pre-train spoken language models, improving performance on various speech-related tasks.

Contribution

It introduces FlauBERT-Oral, a spoken language model trained on 19GB of ASR transcribed speech, showing its benefits over traditional models despite noisy data.

Findings

01

FlauBERT-Oral outperforms initial FlauBERT on downstream tasks

02

ASR-generated text is viable for spoken language modeling

03

Large-scale noisy data can enhance speech task performance

Abstract

We aim at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT) or through training a LM from scratch. New models (FlauBERT-Oral) are shared with the community and evaluated for 3 downstream tasks: spoken language understanding, classification of TV shows and speech syntactic parsing. Results show that FlauBERT-Oral can be beneficial compared to its initial FlauBERT version demonstrating that, despite its inherent noisy nature, ASR-generated text can be used to build spoken language models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems