Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency
Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

TL;DR
This paper introduces Embed to Control (ETC), a novel reinforcement learning algorithm that learns low-dimensional representations of observations and histories in POMDPs, achieving polynomial sample complexity and bridging representation learning with policy optimization.
Contribution
The paper presents ETC, the first sample-efficient algorithm that combines representation learning and policy optimization in POMDPs with infinite observation and state spaces.
Findings
ETC attains $O(1/\epsilon^2)$ sample complexity for low-rank POMDPs.
ETC scales polynomially with the horizon and intrinsic dimension.
The algorithm effectively learns minimal sufficient representations for control.
Abstract
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Dense Connections · Relative Position Encodings · Position-Wise Feed-Forward Layer · InfoNCE · Contrastive Predictive Coding · Global-Local Attention
