STrajNet: Multi-modal Hierarchical Transformer for Occupancy Flow Field Prediction in Autonomous Driving
Haochen Liu, Zhiyu Huang, Chen Lv

TL;DR
This paper introduces STrajNet, a multi-modal hierarchical transformer that jointly predicts occupancy and flow fields for autonomous driving, effectively modeling social interactions and scene dynamics with a novel attention mechanism.
Contribution
The paper proposes a novel multi-modal hierarchical transformer with flow-guided multi-head self-attention for joint occupancy and flow prediction, improving performance with a compact architecture.
Findings
Achieves comparable or better performance than state-of-the-art models.
Effectively models social interactions and scene relations.
Demonstrates the benefits of vectorized agent motion features and FG-MSA.
Abstract
Forecasting the future states of surrounding traffic participants is a crucial capability for autonomous vehicles. The recently proposed occupancy flow field prediction introduces a scalable and effective representation to jointly predict surrounding agents' future motions in a scene. However, the challenging part is to model the underlying social interactions among traffic agents and the relations between occupancy and flow. Therefore, this paper proposes a novel Multi-modal Hierarchical Transformer network that fuses the vectorized (agent motion) and visual (scene flow, map, and occupancy) modalities and jointly predicts the flow and occupancy of the scene. Specifically, visual and vector features from sensory data are encoded through a multi-stage Transformer module and then a late-fusion Transformer module with temporal pixel-wise attention. Importantly, a flow-guided multi-head…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Autonomous Vehicle Technology and Safety · Traffic control and management
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Stochastic Depth · Layer Normalization · Concatenated Skip Connection · Position-Wise Feed-Forward Layer · Adam
