Cross-Paradigm Evaluation of Gaze-Based Semantic Object Identification for Intelligent Vehicles

Penghao Deng; Jidong J. Yang; Jiachen Bian

arXiv:2602.01452·cs.CV·February 3, 2026

Cross-Paradigm Evaluation of Gaze-Based Semantic Object Identification for Intelligent Vehicles

Penghao Deng, Jidong J. Yang, Jiachen Bian

PDF

Open Access

TL;DR

This study evaluates different vision-based methods for identifying driver gaze targets in road scenes, highlighting the strengths of large vision-language models in safety-critical scenarios and discussing the trade-offs with traditional detectors.

Contribution

It compares three vision-based approaches for gaze-based semantic object identification, revealing the superior performance of large VLMs and providing insights for future driver monitoring systems.

Findings

01

Large VLMs outperform traditional detectors in identifying small, safety-critical objects.

02

YOLOv13 and Qwen2.5-VL-32b achieve Macro F1-Scores over 0.84.

03

Trade-off identified between real-time efficiency and contextual robustness.

Abstract

Understanding where drivers direct their visual attention during driving, as characterized by gaze behavior, is critical for developing next-generation advanced driver-assistance systems and improving road safety. This paper tackles this challenge as a semantic identification task from the road scenes captured by a vehicle's front-view camera. Specifically, the collocation of gaze points with object semantics is investigated using three distinct vision-based approaches: direct object detection (YOLOv13), segmentation-assisted classification (SAM2 paired with EfficientNetV2 versus YOLOv13), and query-based Vision-Language Models, VLMs (Qwen2.5-VL-7b versus Qwen2.5-VL-32b). The results demonstrate that the direct object detection (YOLOv13) and Qwen2.5-VL-32b significantly outperform other approaches, achieving Macro F1-Scores over 0.84. The large VLM (Qwen2.5-VL-32b), in particular,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Advanced Neural Network Applications