When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL
Jan Malte T\"opperwien, Aditya Mohan, Marius Lindauer

TL;DR
This paper investigates hyperparameter sensitivity in offline goal-conditioned RL, finding that robustness varies by algorithm and data quality, with insights into gradient interference affecting stability and sensitivity.
Contribution
It reveals that hyperparameter sensitivity in offline RL is less severe than online RL and links this to gradient interference, guiding more robust algorithm design.
Findings
Offline RL shows greater hyperparameter robustness than online RL.
QRL maintains stability with modest expert data, unlike HIQL.
Gradient interference correlates with hyperparameter sensitivity.
Abstract
Hyperparameter sensitivity in Deep Reinforcement Learning (RL) is often accepted as unavoidable. However, it remains unclear whether it is intrinsic to the RL problem or exacerbated by specific training mechanisms. We investigate this question in offline goal-conditioned RL, where data distributions are fixed, and non-stationarity can be explicitly controlled via scheduled shifts in data quality. Additionally, we study varying data qualities under both stationary and non-stationary regimes, and cover two representative algorithms: HIQL (bootstrapped TD-learning) and QRL (quasimetric representation learning). Overall, we observe substantially greater robustness to changes in hyperparameter configurations than commonly reported for online RL, even under controlled non-stationarity. Once modest expert data is present ( 20\%), QRL maintains broad, stable near-optimal regions, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Machine Learning and Data Classification
