Hyperparameter Selection Methods for Fitted Q-Evaluation with Error Guarantee
Kohei Miyaguchi

TL;DR
This paper introduces a hyperparameter selection framework for fitted Q-evaluation in offline policy evaluation, providing error guarantees and multiple methods with different trade-offs, validated through experiments.
Contribution
It proposes the approximate hyperparameter selection framework for FQE, enabling hyperparameter-free evaluation with theoretical error bounds and practical methods.
Findings
Error bounds match empirical results
Four AHS methods with different characteristics
Framework improves FQE utility in real applications
Abstract
We are concerned with the problem of hyperparameter selection for the fitted Q-evaluation (FQE). FQE is one of the state-of-the-art method for offline policy evaluation (OPE), which is essential to the reinforcement learning without environment simulators. However, like other OPE methods, FQE is not hyperparameter-free itself and that undermines the utility in real-life applications. We address this issue by proposing a framework of approximate hyperparameter selection (AHS) for FQE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner without hyperparameters. We then derive four AHS methods each of which has different characteristics such as distribution-mismatch tolerance and time complexity. We also confirm in experiments that the error bound given by the theory matches empirical observations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Advanced Multi-Objective Optimization Algorithms
