Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning

Ziqi Jia; Anmin Wang; Xiaoyang Qu; Xiaowen Yang; Jianzong Wang

arXiv:2506.04595·cs.CV·June 6, 2025

Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning

Ziqi Jia, Anmin Wang, Xiaoyang Qu, Xiaowen Yang, Jianzong Wang

PDF

Open Access

TL;DR

This paper introduces a hierarchical framework and a novel method called Task-aware MoILE for embodied continual learning, enabling agents to learn high-level planning and actions while mitigating catastrophic forgetting.

Contribution

It proposes a hierarchical setup for embodied continual learning and a task-aware MoILE method that uses clustering and SVD to improve knowledge retention.

Findings

01

Reduces forgetting of old tasks compared to existing methods.

02

Effectively distinguishes and recognizes tasks using clustering.

03

Supports continuous learning of new tasks without significant performance loss.

Abstract

Previous continual learning setups for embodied intelligence focused on executing low-level actions based on human commands, neglecting the ability to learn high-level planning and multi-level knowledge. To address these issues, we propose the Hierarchical Embodied Continual Learning Setups (HEC) that divide the agent's continual learning process into two layers: high-level instructions and low-level actions, and define five embodied continual learning sub-setups. Building on these setups, we introduce the Task-aware Mixture of Incremental LoRA Experts (Task-aware MoILE) method. This approach achieves task recognition by clustering visual-text embeddings and uses both a task-level router and a token-level router to select the appropriate LoRA experts. To effectively address the issue of catastrophic forgetting, we apply Singular Value Decomposition (SVD) to the LoRA parameters obtained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition