Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection
Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, Jianhua Tao, Le Xu and, Ruibo Fu

TL;DR
This paper introduces a low-rank adaptation method for wav2vec2-based fake audio detection, significantly reducing training costs while maintaining performance by freezing pre-trained weights and injecting trainable matrices.
Contribution
The paper proposes applying low-rank adaptation (LoRA) to wav2vec2, enabling efficient fine-tuning with fewer parameters without sacrificing detection accuracy.
Findings
LoRA reduces trainable parameters by 198 times compared to full fine-tuning.
LoRA achieves similar performance to full fine-tuning on fake audio detection tasks.
The method significantly decreases training time and memory usage.
Abstract
Self-supervised speech models are a rapidly developing research topic in fake audio detection. Many pre-trained models can serve as feature extractors, learning richer and higher-level speech features. However,when fine-tuning pre-trained models, there is often a challenge of excessively long training times and high memory consumption, and complete fine-tuning is also very expensive. To alleviate this problem, we apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsAdam
