ORSO: Accelerating Reward Design via Online Reward Selection and Policy   Optimization

Chen Bo Calvin Zhang; Zhang-Wei Hong; Aldo Pacchiano; Pulkit Agrawal

arXiv:2410.13837·cs.LG·February 26, 2025

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, Pulkit Agrawal

PDF

Open Access 1 Repo

TL;DR

ORSO is a novel method that automatically selects effective reward functions in reinforcement learning, significantly improving data efficiency and reducing computational costs while matching expert-designed rewards.

Contribution

The paper introduces ORSO, an online reward selection framework that automates reward shaping with provable guarantees, outperforming prior methods in efficiency and effectiveness.

Findings

01

ORSO reduces data requirements for reward evaluation by up to 8 times.

02

It outperforms prior reward shaping methods by more than 50%.

03

ORSO achieves policies comparable to those using manually engineered rewards.

Abstract

Reward shaping is critical in reinforcement learning (RL), particularly for complex tasks where sparse rewards can hinder learning. However, choosing effective shaping rewards from a set of reward functions in a computationally efficient manner remains an open challenge. We propose Online Reward Selection and Policy Optimization (ORSO), a novel approach that frames the selection of shaping reward function as an online model selection problem. ORSO automatically identifies performant shaping reward functions without human intervention with provable regret guarantees. We demonstrate ORSO's effectiveness across various continuous control tasks. Compared to prior approaches, ORSO significantly reduces the amount of data required to evaluate a shaping reward function, resulting in superior data efficiency and a significant reduction in computational time (up to 8 times). ORSO consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

calvincbzhang/orso
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDiverse Scientific and Economic Studies · Economic Policies and Impacts

MethodsSparse Evolutionary Training