Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Fan Chen; Zeyu Jia; Alexander Rakhlin; Tengyang Xie

arXiv:2505.20268·cs.LG·July 25, 2025

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie

PDF

Open Access

TL;DR

This paper analyzes outcome-based online reinforcement learning with endpoint-only rewards, proposing a sample-efficient algorithm, characterizing fundamental limits, and extending to preference-based feedback, thus establishing a theoretical foundation for this setting.

Contribution

It introduces the first provably sample-efficient algorithm for outcome-based RL with general function approximation and characterizes the fundamental limits of outcome-based feedback.

Findings

01

Achieves $ ilde{O}(C_{cov} H^3 / \\epsilon^2)$ sample complexity.

02

Identifies exponential separation between outcome-based feedback and per-step rewards.

03

Extends results to preference-based feedback with similar efficiency.

Abstract

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $O (C_{cov} H^{3} / ϵ^{2})$ sample complexity, where $C_{cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management · Digital Platforms and Economics