Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh, Fatima Anwar, Salma Elmalaki

TL;DR
This paper explores backdoor attacks on vision-language model-based scanpath prediction, demonstrating effective, diverse attacks that evade detection and survive deployment, posing security risks for gaze-driven systems.
Contribution
It introduces novel variable-output backdoor attacks on VLM-based scanpath prediction that are resilient against defenses and practical for edge devices.
Findings
Naive fixed-path attacks are detectable due to clustering.
Proposed input-aware and duration attacks produce diverse, plausible scanpaths.
Attacks remain effective after quantization and on various smartphones.
Abstract
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
