The Curse of Passive Data Collection in Batch Reinforcement Learning

Chenjun Xiao; Ilbin Lee; Bo Dai; Dale Schuurmans; Csaba Szepesvari

arXiv:2106.09973·cs.LG·July 6, 2023·1 cites

The Curse of Passive Data Collection in Batch Reinforcement Learning

Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

PDF

Open Access

TL;DR

This paper demonstrates that passive data collection in batch reinforcement learning significantly increases the sample complexity, especially in Markov decision processes, making passive learning much less efficient than active methods.

Contribution

The paper provides the first sharp characterization of the exponential sample complexity increase in passive data collection for episodic finite state-action MDPs, extending to various settings.

Findings

01

Passive data collection requires exponentially more episodes than active methods.

02

Sample complexity in passive learning scales as A^{min(S-1, H)}/ε^2 in MDPs.

03

Passive learning's difficulty is fundamentally characterized by the exponential blow-up in sample requirements.

Abstract

In high stake applications, active experimentation may be considered too risky and thus data are often collected passively. While in simple cases, such as in bandits, passive and active data collection are similarly effective, the price of passive sampling can be much higher when collecting data from a system with controlled states. The main focus of the current paper is the characterization of this price. For example, when learning in episodic finite state-action Markov decision processes (MDPs) with $S$ states and $A$ actions, we show that even with the best (but passively chosen) logging policy, $Ω (A^{m i n (S - 1, H)} / ε^{2})$ episodes are necessary (and sufficient) to obtain an $ϵ$ -optimal policy, where $H$ is the length of episodes. Note that this shows that the sample complexity blows up exponentially compared to the case of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics