VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction
Stephane Da Silva Martins, Emanuel Aldea, Sylvie Le H\'egarat-Mascle

TL;DR
VISTA is a novel transformer-based framework that jointly models agents' goals and social interactions for accurate and safe multi-agent trajectory prediction in dense environments.
Contribution
It introduces a recursive goal-conditioned transformer with social-token attention and interpretable social influence maps, advancing multi-agent forecasting methods.
Findings
Achieves state-of-the-art accuracy on MADRAS and SDD benchmarks.
Reduces collision rates from 2.14% to 0.03% on MADRAS.
Attains zero collisions on SDD while improving trajectory metrics.
Abstract
Multi-agent trajectory prediction is crucial for autonomous systems operating in dense, interactive environments. Existing methods often fail to jointly capture agents' long-term goals and their fine-grained social interactions, which leads to unrealistic multi-agent futures. We propose VISTA, a recursive goal-conditioned transformer for multi-agent trajectory forecasting. VISTA combines (i) a cross-attention fusion module that integrates long-horizon intent with past motion, (ii) a social-token attention mechanism for flexible interaction modeling across agents, and (iii) pairwise attention maps that make social influence patterns interpretable at inference time. Our model turns single-agent goal-conditioned prediction into a coherent multi-agent forecasting framework. Beyond standard displacement metrics, we evaluate trajectory collision rates as a measure of joint realism. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Anomaly Detection Techniques and Applications · Social Robot Interaction and HRI
