LISA: Language-guided Interference-aware Spatial-Frequency Attention for Driver Gaze Estimation

Jun Ma; Zhenye Yang; Ruichen Zhou; Pei Zhang; Huan Li; Jinpeng Chen

arXiv:2605.17287·cs.CV·May 19, 2026

LISA: Language-guided Interference-aware Spatial-Frequency Attention for Driver Gaze Estimation

Jun Ma, Zhenye Yang, Ruichen Zhou, Pei Zhang, Huan Li, Jinpeng Chen

PDF

1 Repo

TL;DR

LISA is a novel framework that combines frequency-domain priors with vision-language knowledge to improve driver gaze estimation robustness against lighting changes and noise.

Contribution

It introduces a dual-domain fusion mechanism and a training strategy to disentangle gaze features from appearance interference, enhancing accuracy and robustness.

Findings

01

Achieves state-of-the-art performance on two benchmarks.

02

Significantly improves robustness against occlusions and lighting variations.

03

Effectively separates gaze features from appearance interference.

Abstract

Driver gaze estimation serves as a fundamental metric for evaluating driver attentiveness in modern monitoring systems. Beyond being vulnerable to sudden lighting changes and sensor noise, spatial-domain models struggle to disentangle authentic gaze cues from irrelevant visual attributes. In this paper, we propose LISA, a \textbf{L}anguage-guided \textbf{I}nterference-aware \textbf{S}patial-Frequency \textbf{A}ttention framework that combines frequency-domain priors with vision-language knowledge. Observing that the amplitude spectrum remains relatively stable even under spatial perturbations, we design a dual-domain fusion mechanism. It integrates stable low-frequency semantics into high-frequency details, employing spatial attention to precisely target ocular regions. To reduce semantic ambiguity, we also introduce a training-time disentanglement strategy. Using a frozen CLIP encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mason-bupt/LISA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.