PRE-MAP: Personalized Reinforced Eye-tracking Multimodal LLM for High-Resolution Multi-Attribute Point Prediction
Hanbing Wu, Ping Jiang, Anyang Su, Chenxu Zhao, Tianyu Fu, Minghui Wu, Beiping Tan, Huiying Li

TL;DR
This paper introduces PRE-MAP, a personalized eye-tracking model leveraging reinforcement learning and multimodal large language models to predict high-resolution, multi-attribute visual points, addressing limitations of existing saliency models and incorporating subjective cognitive diversity.
Contribution
The paper presents a novel personalized saliency prediction model, PRE-MAP, utilizing reinforcement learning and multimodal LLMs, along with a large-scale gaze dataset SPA-ADV capturing diverse viewer behaviors.
Findings
PRE-MAP outperforms existing models on multiple benchmarks.
Incorporating subjective profiles improves prediction accuracy.
The dataset SPA-ADV enables better understanding of individual attention patterns.
Abstract
Visual selective attention, driven by individual preferences, regulates human prioritization of visual stimuli by bridging subjective cognitive mechanisms with objective visual elements, thereby steering the semantic interpretation and hierarchical processing of dynamic visual scenes. However, existing models and datasets predominantly neglect the influence of subjective cognitive diversity on fixation behavior. Conventional saliency prediction models, typically employing segmentation approaches, rely on low-resolution imagery to generate saliency heatmaps, subsequently upscaled to native resolutions, which limiting their capacity to capture personalized attention patterns. Furthermore, MLLMs are constrained by factors such as hallucinations, making it very costly to strictly adhere to the expected format in tasks involving multiple point predictions, and achieving precise point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Mind wandering and attention
