From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning

Max Hopkins; Sihan Liu; Christopher Ye; Yuichi Yoshida

arXiv:2507.11926·cs.LG·July 17, 2025

From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning

Max Hopkins, Sihan Liu, Christopher Ye, Yuichi Yoshida

PDF

Open Access

TL;DR

This paper demonstrates that sample-efficient, replicable reinforcement learning is achievable in low-horizon tabular MDPs, bridging the gap between generative and episodic settings, and showing exploration is not a major barrier.

Contribution

The authors introduce a nearly optimal replicable RL algorithm with $ ilde{O}(S^2A)$ samples, resolving a key open problem in the field.

Findings

01

Achieves $ ilde{O}(S^2A)$ sample complexity for replicable RL.

02

Provides matching lower bounds in the generative setting.

03

Shows exploration is not a significant obstacle to replicability.

Abstract

The epidemic failure of replicability across empirical science and machine learning has recently motivated the formal study of replicable learning algorithms [Impagliazzo et al. (2022)]. In batch settings where data comes from a fixed i.i.d. source (e.g., hypothesis testing, supervised learning), the design of data-efficient replicable algorithms is now more or less understood. In contrast, there remain significant gaps in our knowledge for control settings like reinforcement learning where an agent must interact directly with a shifting environment. Karbasi et. al show that with access to a generative model of an environment with $S$ states and $A$ actions (the RL 'batch setting'), replicably learning a near-optimal policy costs only $\tilde{O} (S^{2} A^{2})$ samples. On the other hand, the best upper bound without a generative model jumps to $\tilde{O} (S^{7} A^{7})$ [Eaton et al. (2024)] due to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics