A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection

Yassine El Kheir; Fabian Ritter-Guttierez; Arnab Das; Tim Polzehl; Sebastian M\"oller

arXiv:2510.24852·cs.SD·October 30, 2025

A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection

Yassine El Kheir, Fabian Ritter-Guttierez, Arnab Das, Tim Polzehl, Sebastian M\"oller

PDF

TL;DR

This paper presents MultiConvAdapter, a parameter-efficient architecture that enhances synthetic speech detection by capturing multi-scale temporal artifacts with minimal additional parameters, outperforming existing fine-tuning methods.

Contribution

The paper introduces MultiConvAdapter, a novel multi-scale convolutional adapter that efficiently models temporal artifacts in speech, reducing computational costs while improving detection accuracy.

Findings

01

Achieves superior performance on five datasets.

02

Uses only 3.17 million trainable parameters.

03

Outperforms full fine-tuning and existing PEFT methods.

Abstract

Recent synthetic speech detection models typically adapt a pre-trained SSL model via finetuning, which is computationally demanding. Parameter-Efficient Fine-Tuning (PEFT) offers an alternative. However, existing methods lack the specific inductive biases required to model the multi-scale temporal artifacts characteristic of spoofed audio. This paper introduces the Multi-Scale Convolutional Adapter (MultiConvAdapter), a parameter-efficient architecture designed to address this limitation. MultiConvAdapter integrates parallel convolutional modules within the SSL encoder, facilitating the simultaneous learning of discriminative features across multiple temporal resolutions, capturing both short-term artifacts and long-term distortions. With only $3.17$ M trainable parameters ( $1%$ of the SSL backbone), MultiConvAdapter substantially reduces the computational burden of adaptation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.