PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised   Online Refinement

Tewodros Ayalew; Xiao Zhang; Kevin Yuanbo Wu; Tianchong Jiang; Michael; Maire; Matthew R. Walter

arXiv:2411.17764·cs.RO·November 28, 2024

PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement

Tewodros Ayalew, Xiao Zhang, Kevin Yuanbo Wu, Tianchong Jiang, Michael, Maire, Matthew R. Walter

PDF

Open Access

TL;DR

PROGRESSOR is a self-supervised, perceptually guided reward estimator that learns task progress from videos and refines rewards online, enabling robots to learn complex behaviors without manual supervision.

Contribution

It introduces a novel self-supervised reward learning framework that refines rewards adversarially during online RL, improving robotic learning from videos without task-specific data.

Findings

01

Enables robots to learn complex behaviors without external supervision.

02

Outperforms existing methods in real-robot offline RL tasks.

03

Requires no fine-tuning on in-domain task-specific data.

Abstract

We present PROGRESSOR, a novel framework that learns a task-agnostic reward function from videos, enabling policy training through goal-conditioned reinforcement learning (RL) without manual supervision. Underlying this reward is an estimate of the distribution over task progress as a function of the current, initial, and goal observations that is learned in a self-supervised fashion. Crucially, PROGRESSOR refines rewards adversarially during online RL training by pushing back predictions for out-of-distribution observations, to mitigate distribution shift inherent in non-expert observations. Utilizing this progress prediction as a dense reward together with an adversarial push-back, we show that PROGRESSOR enables robots to learn complex behaviors without any external supervision. Pretrained on large-scale egocentric human video from EPIC-KITCHENS, PROGRESSOR requires no fine-tuning on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Neural Networks and Applications · Stock Market Forecasting Methods