VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual   Descriptions

Seokha Moon; Hyun Woo; Hongbeen Park; Haeji Jung; Reza Mahjourian,; Hyung-gun Chi; Hyerin Lim; Sangpil Kim; Jinkyu Kim

arXiv:2407.12345·cs.CV·July 18, 2024·1 cites

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian,, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim

PDF

Open Access 1 Models

TL;DR

VisionTrap introduces a real-time trajectory prediction method for autonomous vehicles that leverages visual cues and textual descriptions, enhancing accuracy over traditional track-based models.

Contribution

It is the first to incorporate surround-view visual inputs and textual supervision from VLM and LLM into trajectory prediction, improving performance and interpretability.

Findings

01

Visual inputs improve prediction accuracy.

02

Textual descriptions guide the model effectively.

03

Achieves 53 ms latency suitable for real-time use.

Abstract

Predicting future trajectories for other road agents is an essential task for autonomous vehicles. Established trajectory prediction methods primarily use agent tracks generated by a detection and tracking system and HD map as inputs. In this work, we propose a novel method that also incorporates visual input from surround-view cameras, allowing the model to utilize visual cues such as human gazes and gestures, road conditions, vehicle turn signals, etc, which are typically hidden from the model in prior methods. Furthermore, we use textual descriptions generated by a Vision-Language Model (VLM) and refined by a Large Language Model (LLM) as supervision during training to guide the model on what to learn from the input data. Despite using these extra inputs, our method achieves a latency of 53 ms, making it feasible for real-time processing, which is significantly faster than that of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Zeroxdesignart/zerox
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques · Data Management and Algorithms · Autonomous Vehicle Technology and Safety