On overfitting and asymptotic bias in batch reinforcement learning with partial observability
Vincent Francois-Lavet, Guillaume Rabusseau, Joelle Pineau, Damien, Ernst, Raphael Fonteneau

TL;DR
This paper analyzes the balance between asymptotic bias and overfitting in batch reinforcement learning with partial observability, highlighting how smaller state representations can reduce overfitting at the cost of increased bias.
Contribution
It provides a formal theoretical framework for understanding the bias-overfitting tradeoff in POMDPs and empirically demonstrates these effects on synthetic and real-world data.
Findings
Smaller state representations decrease overfitting risk.
Increasing asymptotic bias can be a consequence of reduced state complexity.
Function approximation and discount factor tuning can improve the bias-overfitting tradeoff.
Abstract
This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
