Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Xin Guo; Chunrui Zhao; Hong Jia; Ting Dang; Gongping Huang; Xianrui Zheng; Yan Gao

arXiv:2603.21888·eess.AS·March 26, 2026

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

Xin Guo, Chunrui Zhao, Hong Jia, Ting Dang, Gongping Huang, Xianrui Zheng, Yan Gao

PDF

Open Access

TL;DR

This paper introduces an adaptive federated fine-tuning framework with early exits and layer-wise aggregation for self-supervised speech models, addressing heterogeneity and efficiency in privacy-preserving speech tasks.

Contribution

It proposes a novel adaptive fine-tuning method with early exits and depth-aware aggregation to improve efficiency and heterogeneity handling in federated SSL speech models.

Findings

01

Reduces edge computation overhead

02

Supports heterogeneous client hardware

03

Maintains competitive speech task performance

Abstract

Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity, causing straggler effects under unified fine-tuning, while diverse downstream tasks require different representation depths, making full-model updates inefficient. To address these challenges, we propose an adaptive federated fine-tuning framework with early exits. Lightweight prediction heads are inserted at intermediate layers of the SSL backbone, allowing clients to terminate computation based on local constraints and task requirements. We further introduce a layer-wise, depth-aware partial aggregation strategy to better utilize representations from different network depths. Experiments show that the framework reduces edge overhead, supports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Speech Recognition and Synthesis · Internet Traffic Analysis and Secure E-voting