A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for   Automatic Speech Recognition in Multilingual Oral History Archives

Jan Lehe\v{c}ka; Josef V. Psutka; Lubo\v{s} \v{S}m\'idl; Pavel Ircing,; Josef Psutka

arXiv:2407.17160·cs.CL·September 26, 2024

A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives

Jan Lehe\v{c}ka, Josef V. Psutka, Lubo\v{s} \v{S}m\'idl, Pavel Ircing,, Josef Psutka

PDF

6 Models

TL;DR

This study compares monolingual and multilingual Wav2Vec 2.0 models for speech recognition on a multilingual oral history archive, finding monolingual models generally outperform multilingual ones, and provides publicly available pre-trained models.

Contribution

It offers a comparative analysis of Wav2Vec models on a unique multilingual dataset and releases pre-trained models to the research community.

Findings

01

Monolingual models outperform multilingual models on the oral history dataset.

02

Results are consistent across the public CommonVoice dataset.

03

Publicly released pre-trained models for further research.

Abstract

In this paper, we are comparing monolingual Wav2Vec 2.0 models with various multilingual models to see whether we could improve speech recognition performance on a unique oral history archive containing a lot of mixed-language sentences. Our main goal is to push forward research on this unique dataset, which is an extremely valuable part of our cultural heritage. Our results suggest that monolingual speech recognition models are, in most cases, superior to multilingual models, even when processing the oral history archive full of mixed-language sentences from non-native speakers. We also performed the same experiments on the public CommonVoice dataset to verify our results. We are contributing to the research community by releasing our pre-trained models to the public.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.