Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning
Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo

TL;DR
This paper introduces DMPEL, a dynamic expert library framework that enhances lifelong robot learning by efficiently sharing knowledge, reducing forgetting, and enabling flexible task adaptation without relying on task identifiers.
Contribution
The paper proposes DMPEL, a novel lifelong learning method that builds a progressive expert library with a lightweight router, improving transfer and reducing forgetting in robot learning.
Findings
Outperforms state-of-the-art methods on LIBERO benchmark
Uses minimal trainable parameters and storage
Achieves higher success rates during continual adaptation
Abstract
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively builds a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, enabling flexible and efficient lifelong forward…
Peer Reviews
Decision·Submitted to ICLR 2026
(1) Dynamic expert composition allows fine-grained adaptation to new tasks without oracle task identifiers. (2) Full task modularity via LoRA enables parameter efficiency and knowledge sharing across tasks.
(1) Quadratic complexity from dynamic expert mixing and routing could scale poorly for larger networks or very long task sequences. (2) Limited task diversity in evaluation – experiments are restricted to LIBERO simulated environments, not real-world or cross-domain tasks. (3) No comparisons with learning-based retrieval baselines like contrastive skill indexing or diffusion-based task retrievers.
The paper is written in a clear and coherent manner, with well-organized sections. The technical flow is easy to follow, and figures are appropriately used to illustrate the overall architecture and key mechanisms. The manuscript exhibits a complete and logical structure, from problem definition to experimental validation.
The paper uses a large number of mathematical symbols and subscripts, some of which appear without a clear prior definition or consistent formatting. This can make certain derivations harder to read, especially for readers not deeply familiar with the notation conventions used. The current version provides only a brief acknowledgment of potential limitations. I suggest that the authors add a more thorough discussion to strengthen the paper’s self-critique and transparency.
1. Extensive experiments show that DMPEL is a good lifelong learning framework compared to multiple baselines. 2. Extensive ablation studies demonstrate the functionality of each component.
1. Establishing a LoRA library for each task is quite normal, and the expert coefficient replay is also quite standard. 2. Figure 2 shows that the model inference time is over 50 ms, while the baseline is around 30 ms. What is the latency caused by router computation? What is the latency for averaging the weights? Is there any approach to improve efficiency? If the expert synthesis interval is greater than 1, the model latency will increase rapidly when expert synthesis occurs. 3. Figure 7 ind
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
MethodsLib
