Learning Navigation by Visual Localization and Trajectory Prediction
Iulia Paraicu, Marius Leordeanu

TL;DR
This paper introduces a self-driving system that predicts vehicle location and future trajectory using only raw video and destination info, enabling navigation without GPS during operation.
Contribution
It presents a novel approach for real-time vehicle localization and trajectory prediction solely from visual input and destination, trained with GPS data only at training time.
Findings
Outperforms existing methods in visual localization and steering accuracy.
Accurately predicts trajectories up to seven seconds ahead.
Enables GPS-free navigation between known locations.
Abstract
When driving, people make decisions based on current traffic as well as their desired route. They have a mental map of known routes and are often able to navigate without needing directions. Current self-driving models improve their performances when using additional GPS information. Here we aim to push forward self-driving research and perform route planning even in the absence of GPS. Our system learns to predict in real-time vehicle's current location and future trajectory, as a function of time, on a known map, given only the raw video stream and the intended destination. The GPS signal is available only at training time, with training data annotation being fully automatic. Different from other published models, we predict the vehicle's trajectory for up to seven seconds ahead, from which complete steering, speed and acceleration information can be derived for the entire time span.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Human Mobility and Location-Based Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
