SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Pu Wang; Shinji Watanabe; Hugo Van hamme

arXiv:2601.12600·cs.SD·January 21, 2026

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Pu Wang, Shinji Watanabe, Hugo Van hamme

PDF

Open Access

TL;DR

This paper introduces SSVD-O, a structured SVD-based parameter-efficient fine-tuning method for speech recognition that improves adaptation efficiency, balances learning and forgetting, and outperforms existing PEFT methods on domain-shifted ASR tasks.

Contribution

The paper presents SSVD-O, a novel structured SVD-guided fine-tuning approach that enhances speech model adaptation by balancing parameter allocation and reducing forgetting.

Findings

01

SSVD-O narrows the performance gap to full fine-tuning.

02

SSVD-O improves generalization on domain-shifted speech tasks.

03

SSVD-O mitigates catastrophic forgetting in speech model adaptation.

Abstract

Parameter-efficient fine-tuning (PEFT) is a scalable approach for adapting large speech foundation models to new domains. While methods such as LoRA and its state-of-the-art variants reduce adaptation costs, they typically allocate parameters uniformly across model subspaces, which limits their efficiency and scalability in speech applications. Building on our prior work, this paper introduces SSVD-Outer (SSVD-O), an extension of the structured SVD-guided (SSVD) fine-tuning method. SSVD-O combines input acoustic feature space-associated inner transformations with output semantic feature space-associated outer transformations to enable scalable and balanced adaptation. We conduct the first systematic analysis of parameter budget allocation across model subspaces in PEFT for automatic speech recognition (ASR), and investigate the trade-off between learning and forgetting under constrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research