Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Hang Wang, Sen Lin, Junshan Zhang

TL;DR
This paper introduces AdaEQ, an adaptive ensemble Q-learning method that dynamically adjusts the ensemble size based on error feedback to minimize estimation bias and improve learning performance in continuous control tasks.
Contribution
The paper proposes a novel adaptive ensemble Q-learning algorithm that uses theoretical bounds and error feedback to optimize ensemble size during training.
Findings
AdaEQ reduces estimation bias effectively.
AdaEQ outperforms existing methods on MuJoCo benchmarks.
Dynamic ensemble sizing improves learning stability.
Abstract
The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage and Signal Denoising Methods · Air Quality Monitoring and Forecasting · Energy Load and Power Forecasting
MethodsQ-Learning
