Inversely Learning Transferable Rewards via Abstracted States
Yikang Gui, Prashant Doshi

TL;DR
This paper presents a method to learn abstract reward functions through inverse reinforcement learning, enabling transfer of learned behaviors across different domain instances, demonstrated on various simulated tasks.
Contribution
The paper introduces a novel approach to inversely learn transferable abstract reward functions from behavior trajectories across multiple domain instances.
Findings
Abstract reward functions successfully transfer to new domain instances.
Learned rewards enable effective task behavior in unseen domain configurations.
Method validated on multiple tasks in OpenAI's Gym and AssistiveGym.
Abstract
Inverse reinforcement learning (IRL) has progressed significantly toward accurately learning the underlying rewards in both discrete and continuous domains from behavior data. The next advance is to learn {\em intrinsic} preferences in ways that produce useful behavior in settings or tasks which are different but aligned with the observed ones. In the context of robotic applications, this helps integrate robots into processing lines involving new tasks (with shared intrinsic preferences) without programming from scratch. We introduce a method to inversely learn an abstract reward function from behavior trajectories in two or more differing instances of a domain. The abstract reward function is then used to learn task behavior in another separate instance of the domain. This step offers evidence of its transferability and validates its correctness. We evaluate the method on trajectories…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper investigates the important problem of reward transfer between related tasks in IRL. It is crucial that rewards learned through IRL generalize to unseen settings. 2. The paper formally studies the problem of reward transfer in Section 4.5. 3. The comparisons in MuJoCo Gym and Assistive Gym show that TraIRL outperforms baselines in generalization to new target tasks. The analysis in Section 5.2.1 also confirms that TraIRL learns a meaningful state abstraction.
1. TraIRL lacks novelty compared to prior IRL algorithms. The method first learns a state encoding and then performs standard IRL on top of this learned representation. Simply learning a state encoding before applying IRL offers limited advancement over prior work. 2. The experiments are limited to domains that are already well suited for learning an easily transferable state encoding. Both domains use the ground-truth state. In MuJoCo Gym, the state encoder only needs to ignore the joint inform
* **Principled Approach to Disentanglement:** The paper proposes a well-motivated method to disentangle a task's core reward from its specific dynamics by learning a reward function in an abstract state space. This is a significant conceptual strength. * **Strong Empirical Performance:** The method demonstrates superior performance over strong baselines in transferring rewards across tasks with different dynamics within the same domain (e.g., MuJoCo Ant with different disabled legs). * **T
* **Unverifiable Theoretical Assumptions:** The main theoretical result, Theorem 2, hinges on the "structural alignment" assumption that optimal policies in source and target tasks are close in the abstract space. The paper provides no mechanism to verify this assumption for a new target task, rendering the guarantee non-constructive. The theory explains when transfer works, but provides no guidance on how to ensure it. * **Incomplete Reward Transfer:** In the AssistiveGym experiments, the l
1. The authors present a strong idea of learning abstract state representations for a shared reward function. 2. They present promising empirical evidence. The reward transfer from Ant to HalfCheetah is especially surprising.
1. It's unclear how the multi-task VAE actually aligns the state representations between different tasks. Even with the discriminative objective, it's entirely possible that the encoder does not well align semantically equivalent states. There's some evidence that the learned embeddings do align based on the t-SNE plots but this may not be the case in higher dimensions. 2. The definition of the problem setting and what tasks can transfer rewards between are imprecise. The locomotion experime
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
