Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning

Yuheng Lei; Sitong Mao; Shunbo Zhou; Hongyuan Zhang; Xuelong Li; Ping Luo

arXiv:2506.05985·cs.LG·September 24, 2025

Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning

Yuheng Lei, Sitong Mao, Shunbo Zhou, Hongyuan Zhang, Xuelong Li, Ping Luo

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces DMPEL, a dynamic expert library framework that enhances lifelong robot learning by efficiently sharing knowledge, reducing forgetting, and enabling flexible task adaptation without relying on task identifiers.

Contribution

The paper proposes DMPEL, a novel lifelong learning method that builds a progressive expert library with a lightweight router, improving transfer and reducing forgetting in robot learning.

Findings

01

Outperforms state-of-the-art methods on LIBERO benchmark

02

Uses minimal trainable parameters and storage

03

Achieves higher success rates during continual adaptation

Abstract

A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively builds a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, enabling flexible and efficient lifelong forward…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 2

Strengths

(1) Dynamic expert composition allows fine-grained adaptation to new tasks without oracle task identifiers. (2) Full task modularity via LoRA enables parameter efficiency and knowledge sharing across tasks.

Weaknesses

(1) Quadratic complexity from dynamic expert mixing and routing could scale poorly for larger networks or very long task sequences. (2) Limited task diversity in evaluation – experiments are restricted to LIBERO simulated environments, not real-world or cross-domain tasks. (3) No comparisons with learning-based retrieval baselines like contrastive skill indexing or diffusion-based task retrievers.

Reviewer 02Rating 4Confidence 4

Strengths

The paper is written in a clear and coherent manner, with well-organized sections. The technical flow is easy to follow, and figures are appropriately used to illustrate the overall architecture and key mechanisms. The manuscript exhibits a complete and logical structure, from problem definition to experimental validation.

Weaknesses

The paper uses a large number of mathematical symbols and subscripts, some of which appear without a clear prior definition or consistent formatting. This can make certain derivations harder to read, especially for readers not deeply familiar with the notation conventions used. The current version provides only a brief acknowledgment of potential limitations. I suggest that the authors add a more thorough discussion to strengthen the paper’s self-critique and transparency.

Reviewer 03Rating 4Confidence 5

Strengths

1. Extensive experiments show that DMPEL is a good lifelong learning framework compared to multiple baselines. 2. Extensive ablation studies demonstrate the functionality of each component.

Weaknesses

1. Establishing a LoRA library for each task is quite normal, and the expert coefficient replay is also quite standard. 2. Figure 2 shows that the model inference time is over 50 ms, while the baseline is around 30 ms. What is the latency caused by router computation? What is the latency for averaging the weights? Is there any approach to improve efficiency? If the expert synthesis interval is greater than 1, the model latency will increase rapidly when expert synthesis occurs. 3. Figure 7 ind

Code & Models

Repositories

HarryLui98/DMPEL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications

MethodsLib