Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations
Xin Guo, Chunrui Zhao, Hong Jia, Ting Dang, Gongping Huang, Xianrui Zheng, Yan Gao

TL;DR
This paper introduces an adaptive federated fine-tuning framework with early exits and layer-wise aggregation for self-supervised speech models, addressing heterogeneity and efficiency in privacy-preserving speech tasks.
Contribution
It proposes a novel adaptive fine-tuning method with early exits and depth-aware aggregation to improve efficiency and heterogeneity handling in federated SSL speech models.
Findings
Reduces edge computation overhead
Supports heterogeneous client hardware
Maintains competitive speech task performance
Abstract
Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity, causing straggler effects under unified fine-tuning, while diverse downstream tasks require different representation depths, making full-model updates inefficient. To address these challenges, we propose an adaptive federated fine-tuning framework with early exits. Lightweight prediction heads are inserted at intermediate layers of the SSL backbone, allowing clients to terminate computation based on local constraints and task requirements. We further introduce a layer-wise, depth-aware partial aggregation strategy to better utilize representations from different network depths. Experiments show that the framework reduces edge overhead, supports…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Speech Recognition and Synthesis · Internet Traffic Analysis and Secure E-voting
