Is Q-learning an Ill-posed Problem?

Philipp Wissmann; Daniel Hein; Steffen Udluft; Thomas Runkler

arXiv:2502.14365·cs.LG·February 24, 2025

Is Q-learning an Ill-posed Problem?

Philipp Wissmann, Daniel Hein, Steffen Udluft, Thomas Runkler

PDF

Open Access

TL;DR

This paper critically examines the instability of Q-learning in continuous environments, revealing it can be inherently ill-posed and unreliable, challenging its widespread use in reinforcement learning.

Contribution

It systematically analyzes the causes of Q-learning instability, demonstrating that the core task can be fundamentally ill-posed regardless of common error sources.

Findings

01

Q-learning can be inherently ill-posed even in simple benchmarks

02

Bootstrapping and model inaccuracies are not the sole causes of instability

03

Q-learning's fundamental task may be unreliable for reinforcement learning

Abstract

This paper investigates the instability of Q-learning in continuous environments, a challenge frequently encountered by practitioners. Traditionally, this instability is attributed to bootstrapping and regression model errors. Using a representative reinforcement learning benchmark, we systematically examine the effects of bootstrapping and model inaccuracies by incrementally eliminating these potential error sources. Our findings reveal that even in relatively simple benchmarks, the fundamental task of Q-learning - iteratively learning a Q-function from policy-specific target values - can be inherently ill-posed and prone to failure. These insights cast doubt on the reliability of Q-learning as a universal solution for reinforcement learning problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline and Blended Learning

MethodsQ-Learning