Primal-Dual Spectral Representation for Off-policy Evaluation

Yang Hu; Tianyi Chen; Na Li; Kai Wang; Bo Dai

arXiv:2410.17538·cs.LG·October 24, 2024

Primal-Dual Spectral Representation for Off-policy Evaluation

Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai

PDF

Open Access

TL;DR

This paper introduces SpectralDICE, a novel off-policy evaluation method in reinforcement learning that uses spectral decomposition to achieve computational efficiency and better data utilization, supported by theoretical guarantees and empirical results.

Contribution

It establishes a linear spectral representation of primal-dual variables in DICE, enabling efficient optimization and improved data use in off-policy evaluation.

Findings

01

SpectralDICE outperforms existing methods on benchmark tasks.

02

The algorithm has provable sample complexity guarantees.

03

SpectralDICE is both computationally and sample efficient.

Abstract

Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with only experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the curse of horizon. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a linear representation of value function and stationary distribution correction ratio, i.e., primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization