Adapting Image-based RL Policies via Predicted Rewards

Weiyao Wang; Xinyuan Fang; Gregory D. Hager

arXiv:2407.16842·cs.RO·July 25, 2024

Adapting Image-based RL Policies via Predicted Rewards

Weiyao Wang, Xinyuan Fang, Gregory D. Hager

PDF

TL;DR

This paper introduces Predicted Reward Fine-tuning (PRFT), a method that uses predicted rewards to adapt image-based RL policies to new domains, improving generalization despite visual changes.

Contribution

The paper proposes PRFT, a novel fine-tuning approach leveraging reward prediction under domain shift to enhance policy performance in unseen environments.

Findings

01

PRFT improves policy performance across diverse tasks.

02

Predicted rewards remain useful signals under significant domain shift.

03

Fine-tuning with predicted rewards outperforms baseline methods.

Abstract

Image-based reinforcement learning (RL) faces significant challenges in generalization when the visual environment undergoes substantial changes between training and deployment. Under such circumstances, learned policies may not perform well leading to degraded results. Previous approaches to this problem have largely focused on broadening the training observation distribution, employing techniques like data augmentation and domain randomization. However, given the sequential nature of the RL decision-making problem, it is often the case that residual errors are propagated by the learned policy model and accumulate throughout the trajectory, resulting in highly degraded performance. In this paper, we leverage the observation that predicted rewards under domain shift, even though imperfect, can still be a useful signal to guide fine-tuning. We exploit this property to fine-tune a policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.