Adapting Image-based RL Policies via Predicted Rewards
Weiyao Wang, Xinyuan Fang, Gregory D. Hager

TL;DR
This paper introduces Predicted Reward Fine-tuning (PRFT), a method that uses predicted rewards to adapt image-based RL policies to new domains, improving generalization despite visual changes.
Contribution
The paper proposes PRFT, a novel fine-tuning approach leveraging reward prediction under domain shift to enhance policy performance in unseen environments.
Findings
PRFT improves policy performance across diverse tasks.
Predicted rewards remain useful signals under significant domain shift.
Fine-tuning with predicted rewards outperforms baseline methods.
Abstract
Image-based reinforcement learning (RL) faces significant challenges in generalization when the visual environment undergoes substantial changes between training and deployment. Under such circumstances, learned policies may not perform well leading to degraded results. Previous approaches to this problem have largely focused on broadening the training observation distribution, employing techniques like data augmentation and domain randomization. However, given the sequential nature of the RL decision-making problem, it is often the case that residual errors are propagated by the learned policy model and accumulate throughout the trajectory, resulting in highly degraded performance. In this paper, we leverage the observation that predicted rewards under domain shift, even though imperfect, can still be a useful signal to guide fine-tuning. We exploit this property to fine-tune a policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
