Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation
Xiang Li, Gang Liu, Weitao Zhou, Hongyi Zhu, Zhong Cao

TL;DR
This paper introduces Sequence of Experts (SoE), a temporal alternation method that improves imitation learning robustness in autonomous driving by leveraging temporal scale, leading to state-of-the-art results without increasing model complexity.
Contribution
The paper proposes a novel temporal alternation policy called Sequence of Experts (SoE) that enhances closed-loop imitation learning performance without additional data or model size.
Findings
SoE significantly improves model performance on nuPlan benchmarks.
It achieves state-of-the-art results in autonomous driving tasks.
The method enhances robustness without increasing complexity.
Abstract
Imitation learning (IL) has emerged as a central paradigm in autonomous driving. While IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors, its performance degrades unexpectedly in closed-loop due to the gradual accumulation of small, often imperceptible errors over time.Over successive planning cycles, these errors compound, potentially resulting in severe failures.Current research efforts predominantly rely on increasingly sophisticated network architectures or high-fidelity training datasets to enhance the robustness of IL planners against error accumulation, focusing on the state-level robustness at a single time point. However, autonomous driving is inherently a continuous-time process, and leveraging the temporal scale to enhance robustness may provide a new perspective for addressing this issue.To this end, we propose a method…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper addresses error accumulation problem from a new perspective. Instead of focusing on improving a single model's architecture or data, it reframes the problem as one of optimal policy deployment. The key insight is that models from different training stages exhibit complementary weaknesses. 2. The authors demonstrate the effectiveness of SoE across a diverse set of baseline planners (rule-based, MLP-based, Transformer-based), proving its broad applicability. 3. The "plug-and-play" n
1. It seems just blindly switching models over time. The system does not appear to detect or anticipate error accumulation before switching. It switches policies regardless of whether the current policy is performing well or poorly. This could be suboptimal, as it might unnecessarily interrupt a perfectly good trajectory or fail to switch at the most critical moment. Have the authors considered a more intelligent, state-dependent switching strategy (e.g., based on model uncertainty, trajectory d
* A very simple approach: just alternate experts (SoE) every 2nd timestamp * Working on nuPlan, a quite widely used benchmark
* Obvious weakness: validation set should be VERY close in terms of distribution to the test set in order to find the correct combination of experts for SoE * No any ablations / exploration on whether exists a situation when the best combination of experts on val is not the best on the test * Straightforward drawback: need to wait (and spoil resources) for training multiple models in order to include them into SoE (and usage of different ckpts during one training cycle is not the best strategy a
S1. Explores temporal alternation rather than architectural scaling to mitigate closed-loop error accumulation, representing a rarely addressed improvement dimension for IL planners. S2. Introduces zero additional inference cost and requires no model or data modifications, making deployment highly practical. S3. Demonstrates consistent and meaningful closed-loop performance gains across diverse planners, including achieving SOTA on nuPlan. S4. Provides empirical evidence on OL–CL mismatch and
W1. The claim that different seeds provide complementary error-accumulation behaviors is supported only by empirical observations; the paper lacks a deeper theoretical explanation or dynamic modeling of why such complementarity should reliably occur. W2. The experts differ solely by random seeds under identical architectures and data, raising concerns about whether this restricted diversity is consistently strong and generalizable beyond the evaluated cases, especially in larger amount of data.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
