FedCVU: Federated Learning for Cross-View Video Understanding

Shenghan Zhang; Run Ling; Ke Cao; Ao Ma; Zhanjie Zhang

arXiv:2603.21647·cs.CV·March 24, 2026

FedCVU: Federated Learning for Cross-View Video Understanding

Shenghan Zhang, Run Ling, Ke Cao, Ao Ma, Zhanjie Zhang

PDF

Open Access

TL;DR

FedCVU introduces a federated learning framework for cross-view video understanding that effectively handles view heterogeneity, reduces communication costs, and improves cross-view semantic alignment, advancing privacy-preserving multi-camera video analysis.

Contribution

The paper presents FedCVU, a novel federated learning framework with view-specific normalization, contrastive alignment, and selective layer aggregation to address cross-view heterogeneity and communication challenges.

Findings

01

Outperforms state-of-the-art FL methods on action understanding and person re-identification tasks.

02

Improves unseen-view accuracy while maintaining seen-view performance.

03

Demonstrates robustness to domain heterogeneity and communication constraints.

Abstract

Federated learning (FL) has emerged as a promising paradigm for privacy-preserving multi-camera video understanding. However, applying FL to cross-view scenarios faces three major challenges: (i) heterogeneous viewpoints and backgrounds lead to highly non-IID client distributions and overfitting to view-specific patterns, (ii) local distribution biases cause misaligned representations that hinder consistent cross-view semantics, and (iii) large video architectures incur prohibitive communication overhead. To address these issues, we propose FedCVU, a federated framework with three components: VS-Norm, which preserves normalization parameters to handle view-specific statistics; CV-Align, a lightweight contrastive regularization module to improve cross-view representation alignment; and SLA, a selective layer aggregation strategy that reduces communication without sacrificing accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Privacy-Preserving Technologies in Data