2048: Reinforcement Learning in a Delayed Reward Environment

Prady Saligram; Tanvir Bhathal; Robby Manihani

arXiv:2507.05465·cs.LG·July 28, 2025

2048: Reinforcement Learning in a Delayed Reward Environment

Prady Saligram, Tanvir Bhathal, Robby Manihani

PDF

Open Access

TL;DR

This paper introduces a distributional multi-step reinforcement learning framework that significantly improves agent performance in the challenging game 2048 with delayed rewards, achieving state-of-the-art scores.

Contribution

The work presents a novel H-DQN algorithm combining distributional learning, dueling architectures, and other techniques, advancing RL in sparse, delayed reward environments.

Findings

01

H-DQN outperforms standard DQN, PPO, and QR-DQN in 2048.

02

H-DQN reaches the 2048 tile with a score of 18.21K.

03

Scaling H-DQN achieves a score of 41.828K and a 4096 tile.

Abstract

Delayed and sparse rewards present a fundamental obstacle for reinforcement-learning (RL) agents, which struggle to assign credit for actions whose benefits emerge many steps later. The sliding-tile game 2048 epitomizes this challenge: although frequent small score changes yield immediate feedback, they often mislead agents into locally optimal but globally suboptimal strategies. In this work, we introduce a unified, distributional multi-step RL framework designed to directly optimize long-horizon performance. Using the open source Gym-2048 environment we develop and compare four agent variants: standard DQN, PPO, QR-DQN (Quantile Regression DQN), and a novel Horizon-DQN (H-DQN) that integrates distributional learning, dueling architectures, noisy networks, prioritized replay, and more. Empirical evaluation reveals a clear hierarchy in effectiveness: max episode scores improve from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Adversarial Robustness in Machine Learning

MethodsQ-Learning · Convolution · Dense Connections · Deep Q-Network · Proximal Policy Optimization