Embed to Control Partially Observed Systems: Representation Learning   with Provable Sample Efficiency

Lingxiao Wang; Qi Cai; Zhuoran Yang; Zhaoran Wang

arXiv:2205.13476·cs.LG·April 2, 2024

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

PDF

Open Access

TL;DR

This paper introduces Embed to Control (ETC), a novel reinforcement learning algorithm that learns low-dimensional representations of observations and histories in POMDPs, achieving polynomial sample complexity and bridging representation learning with policy optimization.

Contribution

The paper presents ETC, the first sample-efficient algorithm that combines representation learning and policy optimization in POMDPs with infinite observation and state spaces.

Findings

01

ETC attains $O(1/\epsilon^2)$ sample complexity for low-rank POMDPs.

02

ETC scales polynomially with the horizon and intrinsic dimension.

03

The algorithm effectively learns minimal sufficient representations for control.

Abstract

Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Dense Connections · Relative Position Encodings · Position-Wise Feed-Forward Layer · InfoNCE · Contrastive Predictive Coding · Global-Local Attention