Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence
Shengbo Wang

TL;DR
This paper introduces Q-Measure-Learning, a novel method for continuous state reinforcement learning that efficiently estimates Q-functions using kernel measures, with proven convergence and practical experiments.
Contribution
It proposes a new kernel-based measure learning approach for continuous state RL with convergence guarantees and efficient implementation.
Findings
Almost sure convergence of the Q-function to the fixed point.
Bound on the approximation error related to kernel bandwidth.
Successful RL experiments in inventory control setting.
Abstract
We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an infinite-dimensional, function-valued estimate, we propose the novel Q-Measure-Learning, which learns a signed empirical measure supported on visited state-action pairs and reconstructs an action-value estimate via kernel integration. The method jointly estimates the stationary distribution of the behavior chain and the Q-measure through coupled stochastic approximation, leading to an efficient weight-based implementation with memory and computation cost per iteration. Under uniform ergodicity of the behavior chain, we prove almost sure sup-norm convergence of the induced Q-function to the fixed point of a kernel-smoothed Bellman operator. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
