Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie

TL;DR
This paper analyzes outcome-based online reinforcement learning with endpoint-only rewards, proposing a sample-efficient algorithm, characterizing fundamental limits, and extending to preference-based feedback, thus establishing a theoretical foundation for this setting.
Contribution
It introduces the first provably sample-efficient algorithm for outcome-based RL with general function approximation and characterizes the fundamental limits of outcome-based feedback.
Findings
Achieves $ ilde{O}(C_{cov} H^3 / \\epsilon^2)$ sample complexity.
Identifies exponential separation between outcome-based feedback and per-step rewards.
Extends results to preference-based feedback with similar efficiency.
Abstract
Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving sample complexity, where is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Digital Platforms and Economics
