Efficient Adaptation of Multilingual Models for Japanese ASR

Mark Bajo; Haruka Fukukawa; Ryuji Morita; and Yuma Ogasawara

arXiv:2412.10705·cs.CL·December 17, 2024

Efficient Adaptation of Multilingual Models for Japanese ASR

Mark Bajo, Haruka Fukukawa, Ryuji Morita, and Yuma Ogasawara

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that fine-tuning multilingual ASR models like Whisper-Tiny with Japanese datasets and LoRA significantly improves Japanese speech recognition accuracy, surpassing monolingual models in CER.

Contribution

It introduces a fine-tuning approach using LoRA and end-to-end training to adapt multilingual models for Japanese ASR, achieving state-of-the-art performance.

Findings

01

CER reduced from 32.7 to 14.7 with fine-tuning

02

Fine-tuning surpasses Whisper-Base's CER of 20.2

03

Method retains model flexibility for language-specific tasks

Abstract

This study explores fine-tuning multilingual ASR (Automatic Speech Recognition) models, specifically OpenAI's Whisper-Tiny, to improve performance in Japanese. While multilingual models like Whisper offer versatility, they often lack precision in specific languages. Conversely, monolingual models like ReazonSpeech excel in language-specific tasks but are less adaptable. Using Japanese-specific datasets and Low-Rank Adaptation (LoRA) along with end-to-end (E2E) training, we fine-tuned Whisper-Tiny to bridge this gap. Our results show that fine-tuning reduced Whisper-Tiny's Character Error Rate (CER) from 32.7 to 20.8 with LoRA and to 14.7 with end-to-end fine-tuning, surpassing Whisper-Base's CER of 20.2. However, challenges with domain-specific terms remain, highlighting the need for specialized datasets. These findings demonstrate that fine-tuning multilingual models can achieve strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryujimorita/tokyo_whisperers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques