Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models
Zeyuan Wang, Chaofeng Sha, Su Yang

TL;DR
This paper introduces a reinforcement learning-based method for efficient black-box adversarial attacks on video recognition models, focusing on key frame and region selection to reduce computational costs while maintaining attack effectiveness.
Contribution
It proposes a novel reinforcement learning framework for selecting key frames in video attacks, combined with saliency detection and gradient sign estimation to improve efficiency.
Findings
Effective attack success rate on real datasets
Significant reduction in computation time
Robustness in untargeted and targeted attack scenarios
Abstract
We explore the black-box adversarial attack on video recognition models. Attacks are only performed on selected key regions and key frames to reduce the high computation cost of searching adversarial perturbations on a video due to its high dimensionality. To select key frames, one way is to use heuristic algorithms to evaluate the importance of each frame and choose the essential ones. However, it is time inefficient on sorting and searching. In order to speed up the attack process, we propose a reinforcement learning based frame selection strategy. Specifically, the agent explores the difference between the original class and the target class of videos to make selection decisions. It receives rewards from threat models which indicate the quality of the decisions. Besides, we also use saliency detection to select key regions and only estimate the sign of gradient instead of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
