A Comparative Study of LLM-based ASR and Whisper in Low Resource and   Code Switching Scenario

Zheshu Song; Ziyang Ma; Yifan Yang; Jianheng Zhuo; Xie; Chen

arXiv:2412.00721·cs.AI·December 5, 2024

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario

Zheshu Song, Ziyang Ma, Yifan Yang, Jianheng Zhuo, Xie, Chen

PDF

Open Access

TL;DR

This paper compares LLM-based ASR and Whisper models in low resource and code switching scenarios, showing LLMs excel in low resource settings while Whisper is better for code switching, advancing ASR research.

Contribution

It explores LLM-based ASR in low resource and code switching contexts, providing comparative analysis against Whisper, which is underexplored in these scenarios.

Findings

01

LLM-based ASR outperforms Whisper by 12.8% in low resource settings.

02

Whisper performs better in Mandarin-English code switching.

03

The study highlights the potential of LLMs for low resource speech recognition.

Abstract

Large Language Models (LLMs) have showcased exceptional performance across diverse NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant trend in the Automatic Speech Recognition (ASR) field. Previous works mainly concentrated on leveraging LLMs for speech recognition in English and Chinese. However, their potential for addressing speech recognition challenges in low resource settings remains underexplored. Hence, in this work, we aim to explore the capability of LLMs in low resource ASR and Mandarin-English code switching ASR. We also evaluate and compare the recognition performance of LLM-based ASR systems against Whisper model. Extensive experiments demonstrate that LLM-based ASR yields a relative gain of 12.8\% over the Whisper model in low resource ASR while Whisper performs better in Mandarin-English code switching ASR. We hope that this study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · VLSI and Analog Circuit Testing