NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, Wei Yang

TL;DR
This paper introduces NICGSlowDown, an attack method that subtly alters images to significantly increase the computational latency of neural image caption generation models, highlighting a new efficiency vulnerability.
Contribution
The paper presents NICGSlowDown, a novel attack approach to evaluate and demonstrate the efficiency robustness issues in NICG models under minimal input perturbations.
Findings
NICGSlowDown can increase model latency by up to 483.86%.
Perturbations are human-unnoticeable.
Highlights the importance of efficiency robustness in NICG models.
Abstract
Neural image caption generation (NICG) models have received massive attention from the research community due to their excellent performance in visual understanding. Existing work focuses on improving NICG model accuracy while efficiency is less explored. However, many real-world applications require real-time feedback, which highly relies on the efficiency of NICG models. Recent research observed that the efficiency of NICG models could vary for different inputs. This observation brings in a new attack surface of NICG models, i.e., An adversary might be able to slightly change inputs to cause the NICG models to consume more computational resources. To further understand such efficiency-oriented threats, we propose a new attack approach, NICGSlowDown, to evaluate the efficiency robustness of NICG models. Our experimental results show that NICGSlowDown can generate images with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Human Pose and Action Recognition
