Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning
Se-Wook Yoo, Seung-Woo Seo

TL;DR
This paper introduces a novel multi-task reward learning method using variational inverse reinforcement learning, enabling robust policy transfer and improved data efficiency in complex, dynamic environments.
Contribution
It extends empowerment-based regularization to multi-task settings with unknown dynamics, deriving a variational lower bound for mutual information to learn transferable rewards.
Findings
Achieves better performance than existing imitation learning methods.
Demonstrates robustness to randomness and task changes.
Improves data efficiency in multi-task transfer learning.
Abstract
Many robotic tasks are composed of a lot of temporally correlated sub-tasks in a highly complex environment. It is important to discover situational intentions and proper actions by deliberating on temporal abstractions to solve problems effectively. To understand the intention separated from changing task dynamics, we extend an empowerment-based regularization technique to situations with multiple tasks based on the framework of a generative adversarial network. Under the multitask environments with unknown dynamics, we focus on learning a reward and policy from the unlabeled expert examples. In this study, we define situational empowerment as the maximum of mutual information representing how an action conditioned on both a certain state and sub-task affects the future. Our proposed method derives the variational lower bound of the situational mutual information to optimize it. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
