Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization
Zheng Wen, Benjamin Van Roy

TL;DR
This paper introduces an algorithm called optimistic constraint propagation (OCP) for efficient reinforcement learning in deterministic systems, providing theoretical guarantees and computational insights.
Contribution
The paper proposes OCP, a novel algorithm that combines exploration and value function generalization with theoretical performance bounds.
Findings
OCP achieves near-optimal performance in finite episodes when the true value function is within the hypothesis class.
Theoretical guarantees extend to cases where the true value function is outside the hypothesis class, under specific conditions.
Computational results demonstrate the practical effectiveness of OCP in illustrative examples.
Abstract
We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within a given hypothesis class, OCP selects optimal actions over all but at most K episodes, where K is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis class, for the special case where the hypothesis class is the span of pre-specified indicator functions over disjoint sets. We also discuss the computational complexity of OCP and present computational results involving two illustrative examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Formal Methods in Verification
