AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

Geonwoo Cho; Jaemoon Lee; Jaegyun Im; Subi Lee; Jihwan Lee; Sundong Kim

arXiv:2506.05980·cs.LG·March 17, 2026

AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

Geonwoo Cho, Jaemoon Lee, Jaegyun Im, Subi Lee, Jihwan Lee, Sundong Kim

PDF

Open Access 1 Repo 3 Reviews

TL;DR

AMPED is a novel method that balances exploration and skill diversity in skill-based reinforcement learning, improving adaptation and reducing sample complexity through gradient projection and skill selection.

Contribution

It introduces a gradient-surgery projection technique and a skill selector to explicitly balance exploration and diversity during pretraining and fine-tuning.

Findings

01

AMPED outperforms baseline methods across various benchmarks.

02

Each component of AMPED significantly contributes to its performance.

03

Greater skill diversity reduces fine-tuning sample complexity.

Abstract

Skill-based reinforcement learning (SBRL) enables rapid adaptation in environments with sparse rewards by pretraining a skill-conditioned policy. Effective skill learning requires jointly maximizing both exploration and skill diversity. However, existing methods often face challenges in simultaneously optimizing for these two conflicting objectives. In this work, we propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED), which explicitly addresses both: during pre-training, a gradient-surgery projection balances the exploration and diversity gradients, and during fine-tuning, a skill selector exploits the learned diversity by choosing skills suited to downstream tasks. Our approach achieves performance that surpasses SBRL baselines across various benchmarks. Through an extensive ablation study, we identify the role of each…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- The paper is very clear and easy to follow, and the proposed method is well-motivated. - Strong results on several tasks from the URLB benchmark with comparisons against 7 baselines, with consistent improvements. - Table 2 effectively demonstrates that each component (RND, AnInfoNCE, gradient surgery, skill selector) contributes meaningfully to performance. - Figure 5 (Tree Maze) provides intuitive evidence that AMPED achieves both skill separation and state coverage where baselines fail at on

Weaknesses

Section 4.4 claims are problematic in multiple ways: - Line 200 claims "empirical validation appears in Section 4.4," but Theorem 1 assumes sampling trajectories from the **optimal** policy (requiring access to it), while Section 4.4 measures convergence speed to the optimal policy during fine-tuning. Theorem 1 establishes that diversity reduces sample complexity for skill identification given optimal trajectories—not that diversity accelerates convergence during learning. - The statement "Combi

Reviewer 02Rating 4Confidence 4

Strengths

1. The experimental analysis is thorough. In particular, extensive ablation studies are provided to demonstrate the effectiveness of each component in different scenarios. 2. The paper is exceptionally clear and logically fluent. The progression from the problem definition in the Introduction, to the Methods section, and then to the Experimental Analysis is easy to follow. Furthermore, the Appendix provides extremely detailed further analysis, enhancing comprehensibility. 3. The paper models t

Weaknesses

1. I find that the method presented in the paper largely consists of combining existing techniques, and it lacks significant methodological innovation. 2. Directly using SAC to learn a policy over the skill repertoire (i.e., changing the prior z) seems feasible. However, the paper does not couple the two RL training processes well, which may lead to unstable convergence and, as noted in Table 6, instances of ineffectiveness. 3. The experimental environments are relatively simple, with experime

Reviewer 03Rating 6Confidence 4

Strengths

- This paper is easy to follow. - The results of the ablation study are comprehensive.

Weaknesses

- Some content is unclearly explained, such as the specific form of $\rho$ and the meaning of the several metrics in Figure 6. - The proof of Theorem 1 is currently difficult to ascertain as valid (see Questions for details). - The experimental settings are too simple, primarily consisting of basic maze environments. - The presentation of the conflicts during the skill learning process is not intuitive.

Code & Models

Repositories

Cho-Geonwoo/amped
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

MethodsSparse Evolutionary Training