Cross-Paradigm Evaluation of Gaze-Based Semantic Object Identification for Intelligent Vehicles
Penghao Deng, Jidong J. Yang, Jiachen Bian

TL;DR
This study evaluates different vision-based methods for identifying driver gaze targets in road scenes, highlighting the strengths of large vision-language models in safety-critical scenarios and discussing the trade-offs with traditional detectors.
Contribution
It compares three vision-based approaches for gaze-based semantic object identification, revealing the superior performance of large VLMs and providing insights for future driver monitoring systems.
Findings
Large VLMs outperform traditional detectors in identifying small, safety-critical objects.
YOLOv13 and Qwen2.5-VL-32b achieve Macro F1-Scores over 0.84.
Trade-off identified between real-time efficiency and contextual robustness.
Abstract
Understanding where drivers direct their visual attention during driving, as characterized by gaze behavior, is critical for developing next-generation advanced driver-assistance systems and improving road safety. This paper tackles this challenge as a semantic identification task from the road scenes captured by a vehicle's front-view camera. Specifically, the collocation of gaze points with object semantics is investigated using three distinct vision-based approaches: direct object detection (YOLOv13), segmentation-assisted classification (SAM2 paired with EfficientNetV2 versus YOLOv13), and query-based Vision-Language Models, VLMs (Qwen2.5-VL-7b versus Qwen2.5-VL-32b). The results demonstrate that the direct object detection (YOLOv13) and Qwen2.5-VL-32b significantly outperform other approaches, achieving Macro F1-Scores over 0.84. The large VLM (Qwen2.5-VL-32b), in particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Advanced Neural Network Applications
