TL;DR
FedVideoMAE is a federated learning framework for privacy-preserving video violence detection on edge devices, significantly reducing communication costs while maintaining accuracy and privacy protections.
Contribution
It introduces a parameter-efficient federated approach combining self-supervised learning, differential privacy, and secure aggregation for on-device video moderation.
Findings
Achieves 77.25% accuracy without privacy and 65-66% with strong DP.
Reduces communication by 28.3x compared to full-model federated updates.
Provides analysis of privacy-accuracy trade-offs in small-data federated regimes.
Abstract
Short-form video moderation increasingly needs learning pipelines that protect user privacy without paying the full bandwidth and latency cost of cloud-centralized inference. We present FedVideoMAE, an on-device federated framework for video violence detection that combines self-supervised VideoMAE representations, LoRA-based parameter-efficient adaptation, client-side DP-SGD, and server-side secure aggregation. By updating only 5.5M parameters (about 3.5% of a 156M backbone), FedVideoMAE reduces communication by 28.3x relative to full-model federated updates while keeping raw videos on device throughout training. On RWF-2000 with 40 clients, the method reaches 77.25% accuracy without privacy protection and 65~66% under strong differential privacy. We further show that this privacy gap is consistent with an effective-SNR analysis tailored to the small-data, parameter-efficient federated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
