Primal-Dual Spectral Representation for Off-policy Evaluation
Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai

TL;DR
This paper introduces SpectralDICE, a novel off-policy evaluation method in reinforcement learning that uses spectral decomposition to achieve computational efficiency and better data utilization, supported by theoretical guarantees and empirical results.
Contribution
It establishes a linear spectral representation of primal-dual variables in DICE, enabling efficient optimization and improved data use in off-policy evaluation.
Findings
SpectralDICE outperforms existing methods on benchmark tasks.
The algorithm has provable sample complexity guarantees.
SpectralDICE is both computationally and sample efficient.
Abstract
Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with only experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the curse of horizon. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a linear representation of value function and stationary distribution correction ratio, i.e., primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization
