FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

Kaiser Hamid; Can Cui; Khandakar Ashrafi Akbar; Ziran Wang; Nade Liang

arXiv:2511.12708·cs.CV·March 16, 2026

FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

Kaiser Hamid, Can Cui, Khandakar Ashrafi Akbar, Ziran Wang, Nade Liang

PDF

Open Access

TL;DR

FSDAM is a few-shot learning framework that predicts driver attention and generates structured explanations by decomposing attention into reasoning components, using minimal annotated data to enhance interpretability and generalization in autonomous driving.

Contribution

The paper introduces FSDAM, a novel dual-pathway architecture for joint attention prediction and explanation generation with minimal supervision, addressing data scarcity and task interference.

Findings

01

Achieves competitive gaze prediction performance with only 90 annotations.

02

Generates coherent, context-aware explanations for attention shifts.

03

Demonstrates strong zero-shot generalization across multiple benchmarks.

Abstract

Understanding not only where drivers look but also why their attention shifts is essential for interpretable human-AI collaboration in autonomous driving. Driver attention is not purely perceptual but semantically structured. Thus, attention shifts can be learned through minimal semantic supervision rather than dense large-scale annotation. We present \textbf{FSDAM} (\textbf{F}ew-\textbf{S}hot \textbf{D}river \textbf{A}ttention \textbf{M}odeling), a framework that achieves joint spatial attention prediction and structured explanation generation using 90 annotated examples. Our key insight is to decompose attention into an explicit reasoning representation, including scene context, current focus, anticipated next focus, and causal explanation, and to learn next-focus anticipation through minimal-pair supervision. To address task conflict and large sample requirements of existing models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Gaze Tracking and Assistive Technology · Visual Attention and Saliency Detection