TL;DR
This paper introduces Self-Improving Skill Learning (SISL), a hierarchical meta-RL method that enhances robustness and stability in long-horizon tasks by self-guided skill refinement and noise mitigation.
Contribution
The paper proposes a novel self-guided skill refinement framework with skill prioritization, improving robustness and performance in noisy, long-horizon meta-RL environments.
Findings
SISL outperforms existing skill-based meta-RL methods on diverse long-horizon tasks.
The approach achieves stable skill learning even with noisy and suboptimal offline data.
SISL demonstrates reliable adaptation and improved task success rates.
Abstract
Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. By mitigating the effect of noise, SISL achieves reliable skill learning and consistently outperforms other skill-based…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper is well-motivated and very clearly presented, with very useful illustrations. - Separating exploitation ($\pi_h$) and skill improvement ($\pi_{imp}$) with self-supervised guidance via prioritized buffers appears to be a novel and elegant contribution. - The evaluation is sound and detailed, with 4 diverse environments with multiple noise levels, thorough baseline comparisons, and extensive ablations.
I believe the paper would benefit from comparing against a GCRL baseline with Hindsight Experience Replay (HER) or similar relabeling techniques. This would help demonstrate that SISL's approach to leveraging the offline dataset is superior to existing relabeling methods in both sample efficiency and final performance. Minor typo: - Line 279: "addtion"
- This work focuses on a key practical problem in meta-RL, where offline demonstrations may be noisy. - The paper provides strong empirical evidence demonstrating significant performance improvements over all baselines. - The paper is well-structured. The method is presented logically.
- The paper claims only "16% more time per iteration", which seems surprisingly low. A SISL iteration appears to involve: (1) rollout with $\pi_h+\pi_l$, (2) rollout with $\pi_{imp}$, (3) $\pi_h$ update, (4) $\pi_{imp}$ update, (5) $\pi_l$ update, and (6) $\hat{R}$ update. In contrast, the baseline presumably only includes steps (1) and (3). It is unclear how the 16% figure was calculated. A more detailed breakdown and a comparison of total training time, not just per-iteration cost, would be mo
The paper adopts perturbations, a representative technique in meta-learning, where target tasks are unseen by the skill-based reinforcement learning agent. This approach constrains the learning process to remain close to the demonstration manifold, thereby facilitating effective skill acquisition.
The SISL framework appears to be quite complex and computationally expensive. For example, it involves maintaining multiple buffers that serve similar functions. The proposed method also does not seem to be specifically tailored to address the meta-learning problem. Baselines such as SPiRL and SiMPL learn skills solely from offline demonstrations, whereas SISL additionally collects online data, making the comparison unfair. **Minor** Appendix B should be moved to the main body of the paper. Th
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
