On Representation Complexity of Model-based and Model-free Reinforcement Learning
Hanlin Zhu, Baihe Huang, Stuart Russell

TL;DR
This paper investigates the circuit complexity of model-based and model-free reinforcement learning, revealing that while environment models are simple to represent, the optimal Q-function can be exponentially complex, explaining differences in sample efficiency.
Contribution
It introduces the first theoretical analysis of circuit complexity in RL, showing the disparity between simple environment models and complex value functions, supported by empirical evidence.
Findings
Transition and reward functions are representable by constant-depth circuits.
Optimal Q-functions can have exponential circuit complexity.
Empirical results show lower approximation errors for models than for Q-functions.
Abstract
We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with polynomial size, while the optimal -function suffers an exponential circuit complexity in constant-depth circuits. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective: in some cases, the ground-truth rule (model) of the environment is simple to represent, while other quantities, such as -function, appear complex. We empirically corroborate our theory by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advancements in Semiconductor Devices and Circuit Design · Adversarial Robustness in Machine Learning
