Adaptive Exploration for Data-Efficient General Value Function Evaluations
Arushi Jain, Josiah P. Hanna, Doina Precup

TL;DR
This paper introduces GVFExplorer, an adaptive method that learns a single behavior policy to efficiently evaluate multiple General Value Functions in reinforcement learning, improving data efficiency and prediction accuracy.
Contribution
It proposes an adaptive policy learning approach that minimizes return variance across GVFs, enhancing data efficiency in off-policy evaluation settings.
Findings
Reduces environmental interactions needed for GVF evaluation.
Improves prediction accuracy across multiple GVFs.
Effective in both tabular and nonlinear function approximation environments.
Abstract
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce GVFExplorer, which adaptively learns a single behavior policy that efficiently collects data for evaluating multiple GVFs in parallel. Our method optimizes the behavior policy by minimizing the total variance in return across GVFs, thereby reducing the required environmental interactions. We use an existing temporal-difference-style variance estimator to approximate the return variance. We prove that each behavior policy update decreases the overall mean squared error in GVF predictions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Neural Networks and Applications · Numerical Methods and Algorithms
