Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning

\"Unver \c{C}ift\c{c}i

arXiv:2512.22910·cs.LG·December 30, 2025

Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning

\"Unver \c{C}ift\c{c}i

PDF

Open Access

TL;DR

Sat-EnQ introduces a two-phase reinforcement learning framework that employs satisficing to produce stable, low-variance value estimates, significantly reducing failures and computational costs compared to traditional methods.

Contribution

The paper presents a novel satisficing-based ensemble approach with theoretical guarantees and empirical benefits for more reliable and efficient reinforcement learning.

Findings

01

3.8x variance reduction compared to DQN

02

Eliminates catastrophic failures (0% vs 50%)

03

Requires 2.5x less compute than bootstrapped ensembles

Abstract

Deep Q-learning algorithms remain notoriously unstable, especially during early training when the maximization operator amplifies estimation errors. Inspired by bounded rationality theory and developmental learning, we introduce Sat-EnQ, a two-phase framework that first learns to be ``good enough'' before optimizing aggressively. In Phase 1, we train an ensemble of lightweight Q-networks under a satisficing objective that limits early value growth using a dynamic baseline, producing diverse, low-variance estimates while avoiding catastrophic overestimation. In Phase 2, the ensemble is distilled into a larger network and fine-tuned with standard Double DQN. We prove theoretically that satisficing induces bounded updates and cannot increase target variance, with a corollary quantifying conditions for substantial reduction. Empirically, Sat-EnQ achieves 3.8x variance reduction, eliminates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications