Skill-Based Reinforcement Learning with Intrinsic Reward Matching
Ademi Adeniji, Amber Xie, Pieter Abbeel

TL;DR
This paper introduces Intrinsic Reward Matching (IRM), a method that leverages skill discriminators to efficiently adapt pretrained skills to new tasks without environment samples, improving sample efficiency in robotic manipulation benchmarks.
Contribution
IRM unifies skill pretraining and task finetuning by matching intrinsic and task rewards through the skill discriminator, enabling more sample-efficient skill adaptation.
Findings
IRM improves skill transfer efficiency on robotic benchmarks.
Pretrained skills can be effectively matched to new tasks without environment rollouts.
IRM outperforms previous skill selection methods in complex manipulation tasks.
Abstract
While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning. We present Intrinsic Reward Matching (IRM), which unifies these two phases of learning via the , a pretraining model component often discarded during finetuning. Conventional approaches finetune pretrained agents directly at the policy level, often relying on expensive environment rollouts to empirically determine the optimal skill. However, often the most concise yet complete description of a task is the reward function itself, and skill learning methods learn an reward function via the discriminator that corresponds to the skill policy. We propose to leverage the skill discriminator to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Explainable Artificial Intelligence (XAI)
