SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High   Update-To-Data Ratio Reinforcement Learning

Carlo Romeo; Girolamo Macaluso; Alessandro Sestini; Andrew D. Bagdanov

arXiv:2501.08669·cs.LG·March 19, 2025

SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning

Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov

PDF

Open Access

TL;DR

SPEQ introduces a hybrid RL training approach combining online low-UTD updates with offline high-UTD stabilization phases, significantly reducing computational costs while maintaining high performance.

Contribution

The paper presents SPEQ, a novel RL algorithm that efficiently balances online and offline training phases to improve scalability and reduce computational overhead.

Findings

01

SPEQ reduces gradient updates by up to 99%.

02

Training time decreases by up to 78%.

03

Performance on MuJoCo benchmarks is maintained or improved.

Abstract

High update-to-data (UTD) ratio algorithms in reinforcement learning (RL) improve sample efficiency but incur high computational costs, limiting real-world scalability. We propose Offline Stabilization Phases for Efficient Q-Learning (SPEQ), an RL algorithm that combines low-UTD online training with periodic offline stabilization phases. During these phases, Q-functions are fine-tuned with high UTD ratios on a fixed replay buffer, reducing redundant updates on suboptimal data. This structured training schedule optimally balances computational and sample efficiency, addressing the limitations of both high and low UTD ratio approaches. We empirically demonstrate that SPEQ requires from 40% to 99% fewer gradient updates and 27% to 78% less training time compared to state-of-the-art high UTD ratio methods while maintaining or surpassing their performance on the MuJoCo continuous control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization

MethodsDropout