Generalizing Motion Planners with Mixture of Experts for Autonomous Driving
Qiao Sun, Huimin Wang, Jiahao Zhan, Fan Nie, Xin Wen, Leimeng Xu, Kun, Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

TL;DR
This paper introduces StateTransformer-2, a scalable motion planner using a mixture-of-experts Transformer architecture, which improves generalization in autonomous driving by addressing model complexity and training issues.
Contribution
The paper presents a novel scalable motion planner with a mixture-of-experts architecture that enhances generalization and addresses training challenges in autonomous driving.
Findings
STR2 outperforms previous methods on NuPlan dataset
Model scalability improves generalization and accuracy
MoE architecture addresses modality collapse and reward balancing
Abstract
Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that many of these approaches produce limited generalization abilities in planning performance due to overly complex designs or training paradigms. In this paper, we review and benchmark previous methods focusing on generalizations. The experimental results indicate that as models are appropriately scaled, many design elements become redundant. We introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Dropout · Byte Pair Encoding · Absolute Position Encodings · Label Smoothing · Transformer · Dense Connections · Layer Normalization
