Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

Diana Romero; Mutahar Ali; Momin Ahmad Khan; Habiba Farrukh; Fatima Anwar; Salma Elmalaki

arXiv:2604.08766·cs.CR·April 13, 2026

Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh, Fatima Anwar, Salma Elmalaki

PDF

TL;DR

This paper explores backdoor attacks on vision-language model-based scanpath prediction, demonstrating effective, diverse attacks that evade detection and survive deployment, posing security risks for gaze-driven systems.

Contribution

It introduces novel variable-output backdoor attacks on VLM-based scanpath prediction that are resilient against defenses and practical for edge devices.

Findings

01

Naive fixed-path attacks are detectable due to clustering.

02

Proposed input-aware and duration attacks produce diverse, plausible scanpaths.

03

Attacks remain effective after quantization and on various smartphones.

Abstract

Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.