From Pixels to Privacy: Temporally Consistent Video Anonymization via Token Pruning for Privacy Preserving Action Recognition
Nazia Aslam, Abhisek Ray, Joakim Bruslund Haurum, Lukas Esterle, Kamal Nasrollahi

TL;DR
This paper introduces a novel attention-based video anonymization method that selectively prunes privacy-sensitive spatiotemporal regions, maintaining action recognition accuracy while significantly reducing privacy risks.
Contribution
It proposes a new framework using Vision Transformers with dedicated tokens to disentangle utility and privacy features for effective video anonymization.
Findings
Maintains high action recognition accuracy on raw videos.
Reduces privacy leakage substantially compared to baseline methods.
Uses attention distribution comparison for privacy-utility trade-off.
Abstract
Recent advances in large-scale video models have significantly improved video understanding across domains such as surveillance, healthcare, and entertainment. However, these models also amplify privacy risks by encoding sensitive attributes, including facial identity, race, and gender. While image anonymization has been extensively studied, video anonymization remains relatively underexplored, even though modern video models can leverage spatiotemporal motion patterns as biometric identifiers. To address this challenge, we propose a novel attention-driven spatiotemporal video anonymization framework based on systematic disentanglement of utility and privacy features. Our key insight is that attention mechanisms in Vision Transformers (ViTs) can be explicitly structured to separate action-relevant information from privacy-sensitive content. Building on this insight, we introduce two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
