Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided Bounds on the Value Function
Jacob Adamczyk, Stas Tiomkin, Rahul Kulkarni

TL;DR
This paper introduces a method to derive double-sided bounds on the optimal value function in reinforcement learning using prior approximations, enhancing transfer learning and training stability with error analysis for continuous spaces.
Contribution
It presents a novel framework for leveraging arbitrary value function approximations to obtain bounds, extending to continuous spaces and improving training techniques.
Findings
Derived double-sided bounds improve value function estimation.
Extended framework includes error analysis for continuous domains.
Validated new clipping methods through numerical experiments.
Abstract
An agent's ability to leverage past experience is critical for efficiently solving new tasks. Approximate solutions for new tasks can be obtained from previously derived value functions, as demonstrated by research on transfer learning, curriculum learning, and compositionality. However, prior work has primarily focused on using value functions to obtain zero-shot approximations for solutions to a new task. In this work, we show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest. We further extend the framework with error analysis for continuous state and action spaces. The derived results lead to new approaches for clipping during training which we validate numerically in simple domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research
