Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity
Jin Li, Ye Luo, Zigan Wang, Xiaowei Zhang

TL;DR
This paper develops an asymptotic theory for IV-based reinforcement learning algorithms in environments with endogeneity, addressing reinforcement bias caused by dynamic data generation and providing theoretical guarantees and inference formulas.
Contribution
It introduces a novel IV-RL framework with theoretical analysis that accounts for Markovian dependencies and policy improvements, advancing understanding of bias correction in reinforcement learning.
Findings
Reinforcement bias exacerbates endogeneity in dynamic data environments.
Theoretical properties of IV-RL algorithms are established within a stochastic approximation framework.
Formulas for inference on optimal policies are derived, considering intertemporal dependencies.
Abstract
In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. We also provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Reinforcement Learning in Robotics · Evolutionary Algorithms and Applications
