Dialect Identification Using Resource-Efficient Fine-Tuning Approaches

Zirui Lin; Haris Gulzar; Monnika Roslianna Busto; Akiko Masaki; Takeharu Eda; Kazuhiro Nakadai

arXiv:2512.02074·cs.CL·December 3, 2025

Dialect Identification Using Resource-Efficient Fine-Tuning Approaches

Zirui Lin, Haris Gulzar, Monnika Roslianna Busto, Akiko Masaki, Takeharu Eda, Kazuhiro Nakadai

PDF

Open Access

TL;DR

This paper investigates resource-efficient fine-tuning methods for dialect identification in speech, demonstrating significant memory and speed improvements with maintained accuracy using MEFT on the Whisper model.

Contribution

It introduces the application of Memory-Efficient Fine-Tuning (MEFT) methods to speech models for dialect identification, achieving substantial resource savings over traditional fine-tuning.

Findings

01

GPU memory usage reduced by up to 73.25%

02

Training speed increased by a factor of 2.1

03

Accuracy comparable to traditional fine-tuning methods

Abstract

Dialect Identification (DI) is a task to recognize different dialects within the same language from a speech signal. DI can help to improve the downstream speech related tasks even when speakers have a strong dialect. However, fine-tuning a speech model for tasks like DI is expensive in terms of computation cost and memory requirement. Recent studies have explored fine-tuning pre-trained speech models for tasks like DI using Parameter-Efficient Fine-Tuning (PEFT) methods, which offer parameter efficiency but limited improvement in memory efficiency and training speed. To address these challenges, we explore Memory-Efficient Fine-Tuning (MEFT) methods, originally proposed for language processing, and apply them to the general-purpose pre-trained speech model. We then comprehensively analyze the GPU memory usage and fine-tuning speed based on various MEFT methods. As a case study, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Phonetics and Phonology Research