TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models
Li Zhang, Zhongxuan Han, XiaoHua Feng, Jiaming Zhang, Yuyuan Li, Linbo Jiang, Jianan Lin, Chaochao Chen

TL;DR
TOFA introduces a training-free, one-shot federated adaptation method for vision-language models that leverages multimodal features and adaptive mechanisms to efficiently personalize models across diverse datasets without additional training.
Contribution
The paper proposes TOFA, a novel one-shot federated adaptation framework for VLMs that is training-free and effectively handles data heterogeneity using multimodal feature extraction and adaptive weighting.
Findings
Effective across 9 datasets in federated settings
Reduces communication costs with one-shot adaptation
Balances personalization and robustness successfully
Abstract
Efficient and lightweight adaptation of pre-trained Vision-Language Models (VLMs) to downstream tasks through collaborative interactions between local clients and a central server is a rapidly emerging research topic in federated learning. Existing adaptation algorithms are typically trained iteratively, which incur significant communication costs and increase the susceptibility to potential attacks. Motivated by the one-shot federated training techniques that reduce client-server exchanges to a single round, developing a lightweight one-shot federated VLM adaptation method to alleviate these issues is particularly attractive. However, current one-shot approaches face certain challenges in adapting VLMs within federated settings: (1) insufficient exploitation of the rich multimodal information inherent in VLMs; (2) lack of specialized adaptation strategies to systematically handle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
