VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian,, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim

TL;DR
VisionTrap introduces a real-time trajectory prediction method for autonomous vehicles that leverages visual cues and textual descriptions, enhancing accuracy over traditional track-based models.
Contribution
It is the first to incorporate surround-view visual inputs and textual supervision from VLM and LLM into trajectory prediction, improving performance and interpretability.
Findings
Visual inputs improve prediction accuracy.
Textual descriptions guide the model effectively.
Achieves 53 ms latency suitable for real-time use.
Abstract
Predicting future trajectories for other road agents is an essential task for autonomous vehicles. Established trajectory prediction methods primarily use agent tracks generated by a detection and tracking system and HD map as inputs. In this work, we propose a novel method that also incorporates visual input from surround-view cameras, allowing the model to utilize visual cues such as human gazes and gestures, road conditions, vehicle turn signals, etc, which are typically hidden from the model in prior methods. Furthermore, we use textual descriptions generated by a Vision-Language Model (VLM) and refined by a Large Language Model (LLM) as supervision during training to guide the model on what to learn from the input data. Despite using these extra inputs, our method achieves a latency of 53 ms, making it feasible for real-time processing, which is significantly faster than that of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Data Management and Algorithms · Autonomous Vehicle Technology and Safety
