Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

Shengbo Wang

arXiv:2603.03523·cs.LG·March 5, 2026

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

Shengbo Wang

PDF

Open Access

TL;DR

This paper introduces Q-Measure-Learning, a novel method for continuous state reinforcement learning that efficiently estimates Q-functions using kernel measures, with proven convergence and practical experiments.

Contribution

It proposes a new kernel-based measure learning approach for continuous state RL with convergence guarantees and efficient implementation.

Findings

01

Almost sure convergence of the Q-function to the fixed point.

02

Bound on the approximation error related to kernel bandwidth.

03

Successful RL experiments in inventory control setting.

Abstract

We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an infinite-dimensional, function-valued estimate, we propose the novel Q-Measure-Learning, which learns a signed empirical measure supported on visited state-action pairs and reconstructs an action-value estimate via kernel integration. The method jointly estimates the stationary distribution of the behavior chain and the Q-measure through coupled stochastic approximation, leading to an efficient weight-based implementation with $O (n)$ memory and $O (n)$ computation cost per iteration. Under uniform ergodicity of the behavior chain, we prove almost sure sup-norm convergence of the induced Q-function to the fixed point of a kernel-smoothed Bellman operator. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization