SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks
Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang

TL;DR
SVASTIN introduces a novel method for generating imperceptible adversarial videos by leveraging spatio-temporal invertible neural networks, improving attack effectiveness while maintaining video quality.
Contribution
The paper presents SVASTIN, a new approach combining spatio-temporal feature exchange and a guided learning module for more effective adversarial video attacks.
Findings
Outperforms state-of-the-art methods in imperceptibility
Achieves higher fooling rates on UCF-101 and Kinetics-400
Maintains high video quality with minimal perturbations
Abstract
Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to generate adversarial videos through spatio-temporal feature space information exchanging. It consists of a Guided Target Video Learning (GTVL) module to balance the perturbation budget and optimization speed and a Spatio-Temporal Invertible Neural Network (STIN) module to perform spatio-temporal feature space information exchanging between a source video and the target feature tensor learned by GTVL module. Extensive experiments on UCF-101 and Kinetics-400 demonstrate that our proposed SVASTIN can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
