Sample Complexity Reduction via Policy Difference Estimation in Tabular   Reinforcement Learning

Adhyyan Narang; Andrew Wagenmaker; Lillian Ratliff; Kevin Jamieson

arXiv:2406.06856·cs.LG·June 12, 2024

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Adhyyan Narang, Andrew Wagenmaker, Lillian Ratliff, Kevin Jamieson

PDF

Open Access 1 Video

TL;DR

This paper investigates sample complexity in tabular reinforcement learning and shows that estimating policy differences can significantly reduce the number of samples needed, with a novel algorithm achieving the tightest known bounds.

Contribution

It demonstrates that estimating only policy differences, rather than full policies, can reduce sample complexity in tabular RL, and introduces an algorithm that exploits this insight.

Findings

01

Establishes a separation between contextual bandits and RL regarding policy difference estimation.

02

Shows that estimating a reference policy's behavior plus deviations suffices for RL.

03

Provides the tightest known sample complexity bounds for tabular RL.

Abstract

In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing work in bandits has shown that it is possible to identify the best policy by estimating only the difference between the behaviors of individual policies, which can be substantially cheaper than estimating the behavior of each policy directly. However, the best-known complexities in RL fail to take advantage of this and instead estimate the behavior of each policy directly. Does it suffice to estimate only the differences in the behaviors of policies in RL? We answer this question positively for contextual bandits but in the negative for tabular RL, showing a separation between contextual bandits and RL. However, inspired by this, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSparse Evolutionary Training