FedCVU: Federated Learning for Cross-View Video Understanding
Shenghan Zhang, Run Ling, Ke Cao, Ao Ma, Zhanjie Zhang

TL;DR
FedCVU introduces a federated learning framework for cross-view video understanding that effectively handles view heterogeneity, reduces communication costs, and improves cross-view semantic alignment, advancing privacy-preserving multi-camera video analysis.
Contribution
The paper presents FedCVU, a novel federated learning framework with view-specific normalization, contrastive alignment, and selective layer aggregation to address cross-view heterogeneity and communication challenges.
Findings
Outperforms state-of-the-art FL methods on action understanding and person re-identification tasks.
Improves unseen-view accuracy while maintaining seen-view performance.
Demonstrates robustness to domain heterogeneity and communication constraints.
Abstract
Federated learning (FL) has emerged as a promising paradigm for privacy-preserving multi-camera video understanding. However, applying FL to cross-view scenarios faces three major challenges: (i) heterogeneous viewpoints and backgrounds lead to highly non-IID client distributions and overfitting to view-specific patterns, (ii) local distribution biases cause misaligned representations that hinder consistent cross-view semantics, and (iii) large video architectures incur prohibitive communication overhead. To address these issues, we propose FedCVU, a federated framework with three components: VS-Norm, which preserves normalization parameters to handle view-specific statistics; CV-Align, a lightweight contrastive regularization module to improve cross-view representation alignment; and SLA, a selective layer aggregation strategy that reduces communication without sacrificing accuracy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Privacy-Preserving Technologies in Data
