AutoTraces: Autoregressive Trajectory Forecasting via Multimodal Large Language Models
Teng Wang, Yanting Lu, Ruize Wang

TL;DR
AutoTraces leverages multimodal large language models with a novel trajectory tokenization scheme and chain-of-thought reasoning to improve long-term robot trajectory forecasting in human environments.
Contribution
It introduces a new trajectory tokenization method and automated reasoning mechanism, extending LLMs to physical coordinate spaces for enhanced trajectory prediction.
Findings
Achieves state-of-the-art accuracy in long-horizon forecasting
Demonstrates strong cross-scene generalization
Supports flexible-length trajectory predictions
Abstract
We present AutoTraces, an autoregressive vision-language-trajectory model for robot trajectory forecasting in humam-populated environments, which harnesses the inherent reasoning capabilities of large language models (LLMs) to model complex human behaviors. In contrast to prior works that rely solely on textual representations, our key innovation lies in a novel trajectory tokenization scheme, which represents waypoints with point tokens as categorical and positional markers while encoding waypoint numerical values as corresponding point embeddings, seamlessly integrated into the LLM's space through a lightweight encoder-decoder architecture. This design preserves the LLM's native autoregressive generation mechanism while extending it to physical coordinate spaces, facilitates modeling of long-term interactions in trajectory data. We further introduce an automated chain-of-thought (CoT)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Time Series Analysis and Forecasting
