Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
Jie Cheng, Xiaodong Mei, Ming Liu

TL;DR
Forecast-MAE introduces a self-supervised learning framework using masked autoencoders for motion forecasting, leveraging agent and road network interconnections, achieving competitive results without heavy supervision.
Contribution
It presents a novel self-supervised pre-training method for motion forecasting using mask autoencoders tailored for trajectory and road network data.
Findings
Outperforms previous self-supervised methods significantly.
Achieves competitive performance with state-of-the-art supervised models.
Utilizes minimal inductive bias with standard Transformer architecture.
Abstract
This study explores the application of self-supervised learning (SSL) to the task of motion forecasting, an area that has not yet been extensively investigated despite the widespread success of SSL in computer vision and natural language processing. To address this gap, we introduce Forecast-MAE, an extension of the mask autoencoders framework that is specifically designed for self-supervised learning of the motion forecasting task. Our approach includes a novel masking strategy that leverages the strong interconnections between agents' trajectories and road networks, involving complementary masking of agents' future or history trajectories and random masking of lane segments. Our experiments on the challenging Argoverse 2 motion forecasting benchmark show that Forecast-MAE, which utilizes standard Transformer blocks with minimal inductive bias, achieves competitive performance compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Autonomous Vehicle Technology and Safety · Traffic and Road Safety
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections
