VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction

Stephane Da Silva Martins; Emanuel Aldea; Sylvie Le H\'egarat-Mascle

arXiv:2511.10203·cs.CV·November 14, 2025

VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction

Stephane Da Silva Martins, Emanuel Aldea, Sylvie Le H\'egarat-Mascle

PDF

Open Access

TL;DR

VISTA is a novel transformer-based framework that jointly models agents' goals and social interactions for accurate and safe multi-agent trajectory prediction in dense environments.

Contribution

It introduces a recursive goal-conditioned transformer with social-token attention and interpretable social influence maps, advancing multi-agent forecasting methods.

Findings

01

Achieves state-of-the-art accuracy on MADRAS and SDD benchmarks.

02

Reduces collision rates from 2.14% to 0.03% on MADRAS.

03

Attains zero collisions on SDD while improving trajectory metrics.

Abstract

Multi-agent trajectory prediction is crucial for autonomous systems operating in dense, interactive environments. Existing methods often fail to jointly capture agents' long-term goals and their fine-grained social interactions, which leads to unrealistic multi-agent futures. We propose VISTA, a recursive goal-conditioned transformer for multi-agent trajectory forecasting. VISTA combines (i) a cross-attention fusion module that integrates long-horizon intent with past motion, (ii) a social-token attention mechanism for flexible interaction modeling across agents, and (iii) pairwise attention maps that make social influence patterns interpretable at inference time. Our model turns single-agent goal-conditioned prediction into a coherent multi-agent forecasting framework. Beyond standard displacement metrics, we evaluate trajectory collision rates as a measure of joint realism. On the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Anomaly Detection Techniques and Applications · Social Robot Interaction and HRI