TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models

Li Zhang; Zhongxuan Han; XiaoHua Feng; Jiaming Zhang; Yuyuan Li; Linbo Jiang; Jianan Lin; Chaochao Chen

arXiv:2511.16423·cs.AI·November 21, 2025

TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models

Li Zhang, Zhongxuan Han, XiaoHua Feng, Jiaming Zhang, Yuyuan Li, Linbo Jiang, Jianan Lin, Chaochao Chen

PDF

Open Access 1 Video

TL;DR

TOFA introduces a training-free, one-shot federated adaptation method for vision-language models that leverages multimodal features and adaptive mechanisms to efficiently personalize models across diverse datasets without additional training.

Contribution

The paper proposes TOFA, a novel one-shot federated adaptation framework for VLMs that is training-free and effectively handles data heterogeneity using multimodal feature extraction and adaptive weighting.

Findings

01

Effective across 9 datasets in federated settings

02

Reduces communication costs with one-shot adaptation

03

Balances personalization and robustness successfully

Abstract

Efficient and lightweight adaptation of pre-trained Vision-Language Models (VLMs) to downstream tasks through collaborative interactions between local clients and a central server is a rapidly emerging research topic in federated learning. Existing adaptation algorithms are typically trained iteratively, which incur significant communication costs and increase the susceptibility to potential attacks. Motivated by the one-shot federated training techniques that reduce client-server exchanges to a single round, developing a lightweight one-shot federated VLM adaptation method to alleviate these issues is particularly attractive. However, current one-shot approaches face certain challenges in adapting VLMs within federated settings: (1) insufficient exploitation of the rich multimodal information inherent in VLMs; (2) lack of specialized adaptation strategies to systematically handle the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Face recognition and analysis