On the Role of Iterative Computation in Reinforcement Learning
Raj Ghugare, Micha{\l} Bortkiewicz, Alicja Ziarko, Benjamin Eysenbach

TL;DR
This paper formalizes compute-bounded policies in reinforcement learning, demonstrating that increased compute allows solving more complex tasks and better generalization, supported by theoretical proofs and extensive experiments.
Contribution
It introduces a minimal architecture for variable compute in RL and provides theoretical and empirical evidence of its advantages over traditional fixed-architecture policies.
Findings
More compute improves policy performance.
Increased compute enhances generalization to longer horizons.
Architecture outperforms standard networks with fewer parameters.
Abstract
How does the amount of compute available to a reinforcement learning (RL) policy affect its learning? Can policies using a fixed amount of parameters, still benefit from additional compute? The standard RL framework does not provide a language to answer these questions formally. Empirically, deep RL policies are often parameterized as neural networks with static architectures, conflating the amount of compute and the number of parameters. In this paper, we formalize compute bounded policies and prove that policies which use more compute can solve problems and generalize to longer-horizon tasks that are outside the scope of policies with less compute. Building on prior work in algorithmic learning and model-free planning, we propose a minimal architecture that can use a variable amount of compute. Our experiments complement our theory. On a set 31 different tasks spanning online and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · AI-based Problem Solving and Planning
