The Curse of Passive Data Collection in Batch Reinforcement Learning
Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

TL;DR
This paper demonstrates that passive data collection in batch reinforcement learning significantly increases the sample complexity, especially in Markov decision processes, making passive learning much less efficient than active methods.
Contribution
The paper provides the first sharp characterization of the exponential sample complexity increase in passive data collection for episodic finite state-action MDPs, extending to various settings.
Findings
Passive data collection requires exponentially more episodes than active methods.
Sample complexity in passive learning scales as A^{min(S-1, H)}/ε^2 in MDPs.
Passive learning's difficulty is fundamentally characterized by the exponential blow-up in sample requirements.
Abstract
In high stake applications, active experimentation may be considered too risky and thus data are often collected passively. While in simple cases, such as in bandits, passive and active data collection are similarly effective, the price of passive sampling can be much higher when collecting data from a system with controlled states. The main focus of the current paper is the characterization of this price. For example, when learning in episodic finite state-action Markov decision processes (MDPs) with states and actions, we show that even with the best (but passively chosen) logging policy, episodes are necessary (and sufficient) to obtain an -optimal policy, where is the length of episodes. Note that this shows that the sample complexity blows up exponentially compared to the case of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
