PRE-MAP: Personalized Reinforced Eye-tracking Multimodal LLM for High-Resolution Multi-Attribute Point Prediction

Hanbing Wu; Ping Jiang; Anyang Su; Chenxu Zhao; Tianyu Fu; Minghui Wu; Beiping Tan; Huiying Li

arXiv:2507.19213·cs.CV·July 28, 2025

PRE-MAP: Personalized Reinforced Eye-tracking Multimodal LLM for High-Resolution Multi-Attribute Point Prediction

Hanbing Wu, Ping Jiang, Anyang Su, Chenxu Zhao, Tianyu Fu, Minghui Wu, Beiping Tan, Huiying Li

PDF

Open Access

TL;DR

This paper introduces PRE-MAP, a personalized eye-tracking model leveraging reinforcement learning and multimodal large language models to predict high-resolution, multi-attribute visual points, addressing limitations of existing saliency models and incorporating subjective cognitive diversity.

Contribution

The paper presents a novel personalized saliency prediction model, PRE-MAP, utilizing reinforcement learning and multimodal LLMs, along with a large-scale gaze dataset SPA-ADV capturing diverse viewer behaviors.

Findings

01

PRE-MAP outperforms existing models on multiple benchmarks.

02

Incorporating subjective profiles improves prediction accuracy.

03

The dataset SPA-ADV enables better understanding of individual attention patterns.

Abstract

Visual selective attention, driven by individual preferences, regulates human prioritization of visual stimuli by bridging subjective cognitive mechanisms with objective visual elements, thereby steering the semantic interpretation and hierarchical processing of dynamic visual scenes. However, existing models and datasets predominantly neglect the influence of subjective cognitive diversity on fixation behavior. Conventional saliency prediction models, typically employing segmentation approaches, rely on low-resolution imagery to generate saliency heatmaps, subsequently upscaled to native resolutions, which limiting their capacity to capture personalized attention patterns. Furthermore, MLLMs are constrained by factors such as hallucinations, making it very costly to strictly adhere to the expected format in tasks involving multiple point predictions, and achieving precise point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Mind wandering and attention