On the Power of Pre-training for Generalization in RL: Provable Benefits   and Hardness

Haotian Ye; Xiaoyu Chen; Liwei Wang; Simon S. Du

arXiv:2210.10464·cs.LG·June 30, 2023·1 cites

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

Haotian Ye, Xiaoyu Chen, Liwei Wang, Simon S. Du

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical analysis of the benefits and limitations of pre-training in reinforcement learning, showing that pre-training offers limited asymptotic improvements but can be beneficial in non-asymptotic settings.

Contribution

It offers the first theoretical insights into how pre-training affects RL generalization, establishing bounds and algorithms for different interaction scenarios.

Findings

01

Pre-training yields near-optimal policies in an average sense without target interaction.

02

Asymptotically, pre-training improves performance by at most a constant factor.

03

In non-asymptotic regimes, an efficient algorithm with distribution-based regret bounds is proposed.

Abstract

Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques