Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models
Shule Lu, Yujing Wang, Hainan Zhang, Xiaoshan Yang, Hongwei Zheng, Yongxin Tong, Changsheng Xu, Zhiming Zheng

TL;DR
This paper introduces MoR, a federated learning framework that aligns heterogeneous vision-language models using preference-based signals, avoiding direct parameter sharing and enhancing privacy and adaptability.
Contribution
MoR combines reward modeling and mixture-of-rewards mechanisms to enable federated alignment of diverse VLMs without sharing raw data or model parameters.
Findings
MoR outperforms existing federated alignment methods in generalization.
It effectively fuses heterogeneous supervision signals via learned routing.
Experiments show improved cross-client adaptability and privacy preservation.
Abstract
Vision-Language Models (VLMs) have broad potential in privacy-sensitive domains such as healthcare and finance, yet strict data-sharing constraints render centralized training infeasible. Federated Learning mitigates this issue by enabling decentralized training, but practical deployments face challenges due to client heterogeneity in computational resources, application requirements, and model architectures. Under extreme model and data heterogeneity, replacing parameter aggregation with preference-based collaboration offers a more suitable interface, as it eliminates the need for direct parameter or data exchange. Motivated by this, we propose MoR, a federated alignment framework that combines GRPO with Mixture-of-Rewards for heterogeneous VLMs. In MoR, each client locally trains a reward model from local preference annotations, capturing specific evaluation signals without exposing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
