Adaptive Ensemble Aggregation for Actor-Critics
Nicklas Werge, Yi-Shan Wu, Manuel Haussmann, Bahareh Tasdighi, Melih Kandemir

TL;DR
This paper introduces Adaptive Ensemble Aggregation (AEA), a novel method for dynamically combining ensemble-based targets in off-policy actor-critic learning, leading to improved convergence, bias reduction, and performance across control tasks.
Contribution
AEA is the first adaptive ensemble aggregation algorithm that constructs targets from training dynamics, achieving optimal variance reduction and formal guarantees for policy improvement.
Findings
AEA converges to a unique equilibrium minimizing value estimation error.
AEA's bias vanishes as ensemble size increases, demonstrating a shrinkage property.
On most control tasks, AEA outperforms state-of-the-art baselines.
Abstract
Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and variance, leaving the challenge of a truly adaptive approach open. We introduce Adaptive Ensemble Aggregation (AEA), an algorithm that dynamically constructs ensemble-based targets for both critic and actor updates directly from training dynamics. We prove that AEA converges to a unique equilibrium where the aggregation parameter minimizes value estimation error within a defined stability region. Theoretically, we establish that AEA achieves a shrinkage property where the estimation bias vanishes as the total ensemble size grows. Unlike subset-based methods like REDQ, which hit an information bottleneck determined by a fixed variance floor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
