TL;DR
MUSCAT is a new multilingual scientific conversation benchmark designed to evaluate ASR systems' ability to handle code-switching, mixed languages, and scientific vocabulary, highlighting current challenges in multilingual speech recognition.
Contribution
The paper introduces a novel benchmark dataset with an evaluation framework for multilingual, code-switched scientific conversations, addressing a gap in existing ASR evaluation resources.
Findings
State-of-the-art ASR systems still struggle with the dataset.
The benchmark enables consistent cross-language performance comparison.
Current models do not fully handle scientific vocabulary and code-switching.
Abstract
The goal of multilingual speech technology is to facilitate seamless communication between individuals speaking different languages, creating the experience as though everyone were a multilingual speaker. To create this experience, speech technology needs to address several challenges: Handling mixed multilingual input, specific vocabulary, and code-switching. However, there is currently no dataset benchmarking this situation. We propose a new benchmark to evaluate current Automatic Speech Recognition (ASR) systems, whether they are able to handle these challenges. The benchmark consists of bilingual discussions on scientific papers between multiple speakers, each conversing in a different language. We provide a standard evaluation framework, beyond Word Error Rate (WER) enabling consistent comparison of ASR performance across languages. Experimental results demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
