Theoretical Barriers in Bellman-Based Reinforcement Learning
Brieuc Pinon, Rapha\"el Jungers, Jean-Charles Delvenne

TL;DR
This paper identifies fundamental limitations of Bellman-based reinforcement learning algorithms in high-dimensional spaces, showing they can neglect critical information and fail to generalize effectively across the state space.
Contribution
It formalizes a key theoretical barrier in Bellman-based RL and demonstrates the failure of common algorithms on constructed counterexamples.
Findings
Bellman-based methods can overlook critical problem information.
Counterexamples show failure to exploit simple structures.
Hindsight Experience Replay also faces similar limitations.
Abstract
Reinforcement Learning algorithms designed for high-dimensional spaces often enforce the Bellman equation on a sampled subset of states, relying on generalization to propagate knowledge across the state space. In this paper, we identify and formalize a fundamental limitation of this common approach. Specifically, we construct counterexample problems with a simple structure that this approach fails to exploit. Our findings reveal that such algorithms can neglect critical information about the problems, leading to inefficiencies. Furthermore, we extend this negative result to another approach from the literature: Hindsight Experience Replay learning state-to-state reachability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
MethodsExperience Replay
