Improving Code Switching with Supervised Fine Tuning and GELU Adapters

Linh Pham

arXiv:2506.00291·cs.SD·August 4, 2025

Improving Code Switching with Supervised Fine Tuning and GELU Adapters

Linh Pham

PDF

Open Access

TL;DR

This paper enhances code-switching automatic speech recognition by fine-tuning Whisper with GELU adapters and a novel tokenization method, significantly reducing error rates on multiple datasets.

Contribution

It introduces a new tokenization approach and adapter-based fine-tuning for Whisper, improving code-switching ASR performance over existing methods.

Findings

01

Reduced MER to 9.4% on ASCEND dataset

02

Achieved 6% MER on SEAME devman

03

Outperformed state-of-the-art methods in code-switching ASR

Abstract

There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called "Switching Tokenizers Method", improves transcription accuracy. In Part 2, we combine the Switching Tokenizers Method from part 1 and train a GELU based adapter on the encoder. These two methods reduced Total Mixed Error Rate (MER) to 9.4% for the ASCEND dataset, 6% for SEAME devman and 9.7% for SEAME devsge, outperforming current SoTA methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications