Hyperparameter Selection Methods for Fitted Q-Evaluation with Error   Guarantee

Kohei Miyaguchi

arXiv:2201.02300·stat.ML·February 21, 2022

Hyperparameter Selection Methods for Fitted Q-Evaluation with Error Guarantee

Kohei Miyaguchi

PDF

Open Access

TL;DR

This paper introduces a hyperparameter selection framework for fitted Q-evaluation in offline policy evaluation, providing error guarantees and multiple methods with different trade-offs, validated through experiments.

Contribution

It proposes the approximate hyperparameter selection framework for FQE, enabling hyperparameter-free evaluation with theoretical error bounds and practical methods.

Findings

01

Error bounds match empirical results

02

Four AHS methods with different characteristics

03

Framework improves FQE utility in real applications

Abstract

We are concerned with the problem of hyperparameter selection for the fitted Q-evaluation (FQE). FQE is one of the state-of-the-art method for offline policy evaluation (OPE), which is essential to the reinforcement learning without environment simulators. However, like other OPE methods, FQE is not hyperparameter-free itself and that undermines the utility in real-life applications. We address this issue by proposing a framework of approximate hyperparameter selection (AHS) for FQE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner without hyperparameters. We then derive four AHS methods each of which has different characteristics such as distribution-mismatch tolerance and time complexity. We also confirm in experiments that the error bound given by the theory matches empirical observations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Advanced Multi-Objective Optimization Algorithms