SGAP-Gaze: Scene Grid Attention Based Point-of-Gaze Estimation Network for Driver Gaze

Pavan Kumar Sharma; Pranamesh Chakraborty

arXiv:2604.19888·cs.CV·April 23, 2026

SGAP-Gaze: Scene Grid Attention Based Point-of-Gaze Estimation Network for Driver Gaze

Pavan Kumar Sharma, Pranamesh Chakraborty

PDF

TL;DR

This paper introduces SGAP-Gaze, a novel scene-aware attention network for driver gaze estimation that leverages both facial features and traffic scene context, demonstrating improved accuracy on new and existing datasets.

Contribution

The paper presents a new dataset UD-FSG and a Transformer-based attention model that explicitly incorporates scene context into driver gaze estimation, outperforming existing methods.

Findings

01

SGAP-Gaze reduces mean pixel error by 23.5% compared to state-of-the-art models.

02

The model achieves a mean pixel error of 104.73 on UD-FSG and 63.48 on LBW datasets.

03

Incorporating scene context improves gaze estimation accuracy across all spatial regions.

Abstract

Driver gaze estimation is essential for understanding the driver's situational awareness of surrounding traffic. Existing gaze estimation models use driver facial information to predict the Point-of-Gaze (PoG) or the 3D gaze direction vector. We propose a benchmark dataset, Urban Driving-Face Scene Gaze (UD-FSG), comprising synchronized driver-face and traffic-scene images. The scene images provide cues about surrounding traffic, which can help improve the gaze estimation model, along with the face images. We propose SGAP-Gaze, Scene-Grid Attention based Point-of-Gaze estimation network, trained and tested on our UD-FSG dataset, which explicitly incorporates the scene images into the gaze estimation modelling. The gaze estimation network integrates driver face, eye, iris, and scene contextual information. First, the extracted features from facial modalities are fused to form a gaze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.