NFQ2.0: The CartPole Benchmark Revisited

Sascha Lange; Roland Hafner; Martin Riedmiller

arXiv:2511.12644·cs.LG·November 18, 2025

NFQ2.0: The CartPole Benchmark Revisited

Sascha Lange, Roland Hafner, Martin Riedmiller

PDF

Open Access

TL;DR

This paper revisits the classic NFQ algorithm on the CartPole benchmark, introduces NFQ2.0 to improve stability and reproducibility, and demonstrates its effectiveness on a real-world industrial control system.

Contribution

The paper presents NFQ2.0, a modernized variant of NFQ, with ablation studies and practical insights for applying deep RL in industrial settings.

Findings

01

NFQ2.0 improves stability over the original NFQ.

02

Key hyperparameters significantly affect performance.

03

Reproducibility is enhanced with the proposed modifications.

Abstract

This article revisits the 20-year-old neural fitted Q-iteration (NFQ) algorithm on its classical CartPole benchmark. NFQ was a pioneering approach towards modern Deep Reinforcement Learning (Deep RL) in applying multi-layer neural networks to reinforcement learning for real-world control problems. We explore the algorithm's conceptual simplicity and its transition from online to batch learning, which contributed to its stability. Despite its initial success, NFQ required extensive tuning and was not easily reproducible on real-world control problems. We propose a modernized variant NFQ2.0 and apply it to the CartPole task, concentrating on a real-world system build from standard industrial components, to investigate and improve the learning process's repeatability and robustness. Through ablation studies, we highlight key design decisions and hyperparameters that enhance performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Robot Manipulation and Learning