Multi-modal Scene-compliant User Intention Estimation in Navigation
Kavindie Katuwandeniya, Stefan H. Kiss, Lei Shi, and Jaime Valls Miro

TL;DR
This paper introduces a multi-modal, data-driven framework using GANs and visual cues to accurately predict user intentions in vehicle navigation, enhancing safety and control in autonomous systems.
Contribution
It presents a novel multi-modal intention estimation model combining GANs, LSTM, and visual segmentation, improving trajectory prediction accuracy over existing methods.
Findings
Significant reduction in trajectory prediction error.
Effective use of small, unannotated datasets.
Framework applicable to real-world navigation scenarios.
Abstract
A multi-modal framework to generate user intention distributions when operating a mobile vehicle is proposed in this work. The model learns from past observed trajectories and leverages traversability information derived from the visual surroundings to produce a set of future trajectories, suitable to be directly embedded into a perception-action shared control strategy on a mobile agent, or as a safety layer to supervise the prudent operation of the vehicle. We base our solution on a conditional Generative Adversarial Network with Long-Short Term Memory cells to capture trajectory distributions conditioned on past trajectories, further fused with traversability probabilities derived from visual segmentation with a Convolutional Neural Network. The proposed data-driven framework results in a significant reduction in error of the predicted trajectories (versus the ground truth) from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator
