TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
Fan Yang, Shurong Zheng, Hongyin Zhao, Yufei Zhan, Xin Li, Yousong Zhu, Chaoyang Zhao Ming Tang, Jinqiao Wang

TL;DR
TraceVision is a novel vision-language model that incorporates human-like spatial trajectories to improve image understanding, region localization, and scene segmentation, advancing interpretability and spatial reasoning in AI systems.
Contribution
The paper introduces TraceVision, integrating trajectory-aware spatial understanding into vision-language models with a new module, training pipeline, and dataset for enhanced interpretability and performance.
Findings
Achieves state-of-the-art results in trajectory-guided captioning.
Improves region localization and segmentation accuracy.
Enables better cross-frame tracking and temporal attention analysis.
Abstract
Recent Large Vision-Language Models (LVLMs) demonstrate remarkable capabilities in image understanding and natural language generation. However, current approaches focus predominantly on global image understanding, struggling to simulate human visual attention trajectories and explain associations between descriptions and specific regions. We propose TraceVision, a unified vision-language model integrating trajectory-aware spatial understanding in an end-to-end framework. TraceVision employs a Trajectory-aware Visual Perception (TVP) module for bidirectional fusion of visual features and trajectory information. We design geometric simplification to extract semantic keypoints from raw trajectories and propose a three-stage training pipeline where trajectories guide description generation and region localization. We extend TraceVision to trajectory-guided segmentation and video scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Advanced Neural Network Applications
