BEVPredFormer: Spatio-temporal Attention for BEV Instance Prediction in Autonomous Driving
Miguel Antunes-Garc\'ia, Santiago Montiel-Mar\'in, Fabio S\'anchez-Garc\'ia, Rodrigo Guti\'errez-Moreno, Rafael Barea, Luis M. Bergasa

TL;DR
BEVPredFormer is a novel attention-based architecture for BEV instance prediction in autonomous driving, utilizing spatio-temporal attention and difference-guided features to improve scene understanding without relying on recurrent modules.
Contribution
It introduces a camera-only, attention-based model with spatio-temporal processing and difference-guided features, achieving state-of-the-art results in BEV instance prediction.
Findings
Outperforms or matches state-of-the-art methods on nuScenes dataset.
Effective spatio-temporal attention improves scene comprehension.
Architectural components validated through extensive ablation studies.
Abstract
A robust awareness of how dynamic scenes evolve is essential for Autonomous Driving systems, as they must accurately detect, track, and predict the behaviour of surrounding obstacles. Traditional perception pipelines that rely on modular architectures tend to suffer from cumulative errors and latency. Instance Prediction models provide a unified solution, performing Bird's-Eye-View segmentation and motion estimation across current and future frames using information directly obtained from different sensors. However, a key challenge in these models lies in the effective processing of the dense spatial and temporal information inherent in dynamic driving environments. This level of complexity demands architectures capable of capturing fine-grained motion patterns and long-range dependencies without compromising real-time performance. We introduce BEVPredFormer, a novel camera-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
