Efficient Equivariant Transformer for Self-Driving Agent Modeling

Scott Xu; Dian Chen; Kelvin Wong; Chris Zhang; Kion Fallah; Raquel Urtasun

arXiv:2604.01466·cs.RO·April 3, 2026

Efficient Equivariant Transformer for Self-Driving Agent Modeling

Scott Xu, Dian Chen, Kelvin Wong, Chris Zhang, Kion Fallah, Raquel Urtasun

PDF

TL;DR

This paper introduces DriveGATr, an efficient SE(2)-equivariant transformer for self-driving agent modeling that reduces computational costs while maintaining high performance.

Contribution

DriveGATr encodes scene elements as multivectors in geometric algebra and processes them with equivariant transformer blocks, avoiding costly pairwise encodings.

Findings

01

DriveGATr achieves SE(2)-equivariance without quadratic cost.

02

It performs comparably to state-of-the-art in traffic simulation.

03

It offers a better trade-off between performance and computational cost.

Abstract

Accurately modeling agent behaviors is an important task in self-driving. It is also a task with many symmetries, such as equivariance to the order of agents and objects in the scene or equivariance to arbitrary roto-translations of the entire scene as a whole; i.e., SE(2)-equivariance. The transformer architecture is a ubiquitous tool for modeling these symmetries. While standard self-attention is inherently permutation equivariant, explicit pairwise relative positional encodings have been the standard for introducing SE(2)-equivariance. However, this approach introduces an additional cost that is quadratic in the number of agents, limiting its scalability to larger scenes and batch sizes. In this work, we propose DriveGATr, a novel transformer-based architecture for agent modeling that achieves SE(2)-equivariance without the computational cost of existing methods. Inspired by recent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.