Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error   Feedback

Hang Wang; Sen Lin; Junshan Zhang

arXiv:2306.11918·cs.LG·June 22, 2023·6 cites

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Hang Wang, Sen Lin, Junshan Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces AdaEQ, an adaptive ensemble Q-learning method that dynamically adjusts the ensemble size based on error feedback to minimize estimation bias and improve learning performance in continuous control tasks.

Contribution

The paper proposes a novel adaptive ensemble Q-learning algorithm that uses theoretical bounds and error feedback to optimize ensemble size during training.

Findings

01

AdaEQ reduces estimation bias effectively.

02

AdaEQ outperforms existing methods on MuJoCo benchmarks.

03

Dynamic ensemble sizing improves learning stability.

Abstract

The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback· slideslive

Taxonomy

TopicsImage and Signal Denoising Methods · Air Quality Monitoring and Forecasting · Energy Load and Power Forecasting

MethodsQ-Learning