NoRML: No-Reward Meta Learning
Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Jie Tan, Chelsea Finn

TL;DR
NoRML introduces a meta-learning approach that enables reinforcement learning agents to adapt to new environments using only observable dynamics, without relying on explicit reward signals, outperforming traditional methods like MAML in dynamic scenarios.
Contribution
It extends MAML for RL to operate without reward signals by leveraging environment dynamics, with a more expressive update step and targeted exploration capabilities.
Findings
NoRML outperforms MAML in environments with changing dynamics.
The method effectively adapts without explicit reward feedback.
Validated on synthetic and benchmark environments.
Abstract
Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML's finetune step. Our method has a more expressive update step than MAML, while maintaining MAML's gradient based foundation. Additionally, in order to allow more targeted exploration, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Data Stream Mining Techniques
MethodsModel-Agnostic Meta-Learning
