TL;DR
This paper introduces RCSR, a federated learning framework for cross-modal retrieval that handles missing modalities and client heterogeneity using semantic routing and adapter personalization.
Contribution
It proposes a novel federated approach combining semantic routing, prototype anchoring, and adapters on a frozen CLIP backbone for improved retrieval accuracy and personalization.
Findings
RCSR improves global retrieval accuracy on benchmarks.
It enhances client-level retrieval performance, especially with incomplete modalities.
The framework stabilizes training under heterogeneous client data.
Abstract
Federated cross-modal retrieval faces severe challenges from heterogeneous client data, particularly non-IID semantic distributions and missing modalities. Under such heterogeneity, a single global model is often insufficient to capture both shared cross-modal knowledge and client-specific characteristics. We propose RCSR, a personalization-friendly federated framework that integrates prototype anchoring, retrieval-centric semantic routing, and optional client-specific adapters. Built on a frozen CLIP backbone, RCSR leverages lightweight shared adapters for global knowledge transfer while supporting efficient local personalization. Prototype anchoring helps unimodal clients align with global cross-modal semantics, and a server-side semantic router adaptively assigns aggregation weights based on retrieval consistency to mitigate alignment drift during heterogeneous updates. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
