MultiPark: Multimodal Parking Transformer with Next-Segment Prediction

Han Zheng; Zikang Zhou; Guli Zhang; Zhepei Wang; Kaixuan Wang; Peiliang Li; Shaojie Shen; Ming Yang; and Tong Qin

arXiv:2508.11537·cs.RO·August 18, 2025

MultiPark: Multimodal Parking Transformer with Next-Segment Prediction

Han Zheng, Zikang Zhou, Guli Zhang, Zhepei Wang, Kaixuan Wang, Peiliang Li, Shaojie Shen, Ming Yang, and Tong Qin

PDF

TL;DR

MultiPark introduces a multimodal transformer model with next-segment prediction and outcome-oriented loss to improve parking maneuver accuracy and safety in complex, lane-free environments, demonstrating state-of-the-art results and real-world robustness.

Contribution

It presents MultiPark, a novel autoregressive transformer with a next-segment prediction paradigm and parking queries, addressing causal confusion and multimodal behavior in parking scenarios.

Findings

01

Achieves state-of-the-art performance on real-world datasets.

02

Demonstrates robustness in real-world vehicle deployment.

03

Effectively models diverse parking behaviors.

Abstract

Parking accurately and safely in highly constrained spaces remains a critical challenge. Unlike structured driving environments, parking requires executing complex maneuvers such as frequent gear shifts and steering saturation. Recent attempts to employ imitation learning (IL) for parking have achieved promising results. However, existing works ignore the multimodal nature of parking behavior in lane-free open space, failing to derive multiple plausible solutions under the same situation. Notably, IL-based methods encompass inherent causal confusion, so enabling a neural network to generalize across diverse parking scenarios is particularly difficult. To address these challenges, we propose MultiPark, an autoregressive transformer for multimodal parking. To handle paths filled with abrupt turning points, we introduce a data-efficient next-segment prediction paradigm, enabling spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.