Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning
Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan

TL;DR
This paper introduces GQ$(\sigma,\lambda)$, a unified reinforcement learning algorithm combining sampling and expectation methods with function approximation, and proves its convergence with empirical validation.
Contribution
It extends the tabular Q$(\sigma,\lambda)$ to large-scale settings using linear function approximation and provides convergence guarantees.
Findings
GQ$(\sigma,\lambda)$ outperforms traditional methods in standard domains.
The algorithm effectively combines sampling and expectation techniques.
Empirical results demonstrate improved performance over existing algorithms.
Abstract
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q is the first approach unifies them with eligibility trace through the sampling degree . However, it is limited to the tabular case, for large-scale learning, the Q is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ that extends tabular Q with linear function approximation. We prove the convergence of GQ. Empirical results on some standard domains show that GQ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Optimization and Search Problems
MethodsEligibility Trace
