Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation

Xiang Li; Gang Liu; Weitao Zhou; Hongyi Zhu; Zhong Cao

arXiv:2512.13094·cs.RO·December 16, 2025

Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation

Xiang Li, Gang Liu, Weitao Zhou, Hongyi Zhu, Zhong Cao

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Sequence of Experts (SoE), a temporal alternation method that improves imitation learning robustness in autonomous driving by leveraging temporal scale, leading to state-of-the-art results without increasing model complexity.

Contribution

The paper proposes a novel temporal alternation policy called Sequence of Experts (SoE) that enhances closed-loop imitation learning performance without additional data or model size.

Findings

01

SoE significantly improves model performance on nuPlan benchmarks.

02

It achieves state-of-the-art results in autonomous driving tasks.

03

The method enhances robustness without increasing complexity.

Abstract

Imitation learning (IL) has emerged as a central paradigm in autonomous driving. While IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors, its performance degrades unexpectedly in closed-loop due to the gradual accumulation of small, often imperceptible errors over time.Over successive planning cycles, these errors compound, potentially resulting in severe failures.Current research efforts predominantly rely on increasingly sophisticated network architectures or high-fidelity training datasets to enhance the robustness of IL planners against error accumulation, focusing on the state-level robustness at a single time point. However, autonomous driving is inherently a continuous-time process, and leveraging the temporal scale to enhance robustness may provide a new perspective for addressing this issue.To this end, we propose a method…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The paper addresses error accumulation problem from a new perspective. Instead of focusing on improving a single model's architecture or data, it reframes the problem as one of optimal policy deployment. The key insight is that models from different training stages exhibit complementary weaknesses. 2. The authors demonstrate the effectiveness of SoE across a diverse set of baseline planners (rule-based, MLP-based, Transformer-based), proving its broad applicability. 3. The "plug-and-play" n

Weaknesses

1. It seems just blindly switching models over time. The system does not appear to detect or anticipate error accumulation before switching. It switches policies regardless of whether the current policy is performing well or poorly. This could be suboptimal, as it might unnecessarily interrupt a perfectly good trajectory or fail to switch at the most critical moment. Have the authors considered a more intelligent, state-dependent switching strategy (e.g., based on model uncertainty, trajectory d

Reviewer 02Rating 2Confidence 5

Strengths

* A very simple approach: just alternate experts (SoE) every 2nd timestamp * Working on nuPlan, a quite widely used benchmark

Weaknesses

* Obvious weakness: validation set should be VERY close in terms of distribution to the test set in order to find the correct combination of experts for SoE * No any ablations / exploration on whether exists a situation when the best combination of experts on val is not the best on the test * Straightforward drawback: need to wait (and spoil resources) for training multiple models in order to include them into SoE (and usage of different ckpts during one training cycle is not the best strategy a

Reviewer 03Rating 2Confidence 3

Strengths

S1. Explores temporal alternation rather than architectural scaling to mitigate closed-loop error accumulation, representing a rarely addressed improvement dimension for IL planners. S2. Introduces zero additional inference cost and requires no model or data modifications, making deployment highly practical. S3. Demonstrates consistent and meaningful closed-loop performance gains across diverse planners, including achieving SOTA on nuPlan. S4. Provides empirical evidence on OL–CL mismatch and

Weaknesses

W1. The claim that different seeds provide complementary error-accumulation behaviors is supported only by empirical observations; the paper lacks a deeper theoretical explanation or dynamic modeling of why such complementarity should reliably occur. W2. The experts differ solely by random seeds under identical architectures and data, raising concerns about whether this restricted diversity is consistently strong and generalizable beyond the evaluated cases, especially in larger amount of data.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning