
TL;DR
This paper introduces Score-life programming, a new theoretical framework for reinforcement learning that directly computes optimal infinite horizon actions without relying on traditional dynamic programming or policy functions.
Contribution
It presents a novel approach that searches over non-stationary policies and constructs a mapping to optimize infinite horizon actions directly, advancing RL theory.
Findings
Successfully applied to nonlinear optimal control problems
Enables direct computation of infinite horizon action sequences
Provides a new theoretical foundation for RL methods
Abstract
In this paper, we present Score-life programming, a novel theoretical approach for solving reinforcement learning problems. In contrast with classical dynamic programming-based methods, our method can search over non-stationary policy functions, and can directly compute optimal infinite horizon action sequences from a given state. The central idea in our method is the construction of a mapping between infinite horizon action sequences and real numbers in a bounded interval. This construction enables us to formulate an optimization problem for directly computing optimal infinite horizon action sequences, without requiring a policy function. We demonstrate the effectiveness of our approach by applying it to nonlinear optimal control problems. Overall, our contributions provide a novel theoretical framework for formulating and solving reinforcement learning problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Receptor Mechanisms and Signaling
