Periphery-Fovea Multi-Resolution Driving Model guided by Human Attention
Ye Xia, Jinkyu Kim, John Canny, Karl Zipser, and David Whitney

TL;DR
This paper introduces a human attention-guided multi-resolution driving model that uses peripheral low-res and foveal high-res inputs to improve vehicle speed prediction, especially in critical pedestrian scenarios.
Contribution
It presents a novel periphery-fovea multi-resolution model guided by driver gaze, enhancing driving accuracy and critical situation performance over uni-resolution models.
Findings
High-resolution gaze-guided input improves driving accuracy.
Model performs better in pedestrian-critical situations.
Outperforms uni-resolution models with same computational cost.
Abstract
Inspired by human vision, we propose a new periphery-fovea multi-resolution driving model that predicts vehicle speed from dash camera videos. The peripheral vision module of the model processes the full video frames in low resolution. Its foveal vision module selects sub-regions and uses high-resolution input from those regions to improve its driving performance. We train the fovea selection module with supervision from driver gaze. We show that adding high-resolution input from predicted human driver gaze locations significantly improves the driving accuracy of the model. Our periphery-fovea multi-resolution model outperforms a uni-resolution periphery-only model that has the same amount of floating-point operations. More importantly, we demonstrate that our driving model achieves a significantly higher performance gain in pedestrian-involved critical situations than in other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Autonomous Vehicle Technology and Safety
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
