TL;DR
This paper introduces a novel motion forecasting framework grounded in an interpretable motion bank, combining contrastive learning, a retrieval mechanism, and a structured decoding process to improve multi-modal accuracy.
Contribution
It proposes a new end-to-end differentiable architecture that uses a structured motion bank and explicit motion priors for interpretable and accurate motion prediction.
Findings
Achieves competitive multi-modal accuracy on Argoverse 2 and Waymo datasets.
Eliminates the black box issue of latent queries in motion forecasting.
Introduces a novel Anchor Retrieval Layer with a Dual-Level Gated Cross-Attention mechanism.
Abstract
Motion forecasting often requires trading interpretability for predictive accuracy. Standard anchor-based architectures rely on opaque latent queries that are highly prone to latent collapse, or naive trajectory sampling that limits multi-modal diversity. We propose an end-to-end differentiable framework that grounds predictions in a comprehensive "motion bank", a structured embedding space of physically realizable trajectories constructed via contrastive learning. Rather than regressing paths from a blank slate, our architecture dynamically retrieves explicit motion priors using a novel Anchor Retrieval Layer. This module adapts orthogonally initialized queries via a Dual-Level Gated Cross-Attention mechanism and executes discrete trajectory selection using a Straight-Through Gumbel-Softmax estimator to preserve continuous gradient flow. The retrieved semantically grounded anchors are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
