Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning

Raphael Trumpp; \"Omer Veysel \c{C}a\u{g}atan; Bar{\i}\c{s} Akg\"un; Marco Caccamo

arXiv:2605.10546·cs.LG·May 12, 2026

Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning

Raphael Trumpp, \"Omer Veysel \c{C}a\u{g}atan, Bar{\i}\c{s} Akg\"un, Marco Caccamo

PDF

1 Repo

TL;DR

Higher-resolution visual inputs significantly enhance deep reinforcement learning performance and generalization, especially with architectures that effectively process detailed visual information.

Contribution

This work demonstrates the importance of input resolution and proposes architecture modifications that decouple parameter growth from resolution, enabling better scaling.

Findings

01

Higher resolution inputs improve performance and generalization in deep RL.

02

Replacing flattening with global average pooling enables resolution scaling without parameter explosion.

03

Visual scaling yields a 28% performance increase over traditional architectures.

Abstract

Pixel-based deep reinforcement learning agents are typically trained on heavily downsampled visual observations, a convention inherited from early benchmarks rather than grounded in principled design. In this work, we show that observation resolution is a critical yet overlooked variable for policy learning: higher-resolution inputs can substantially improve both performance and generalization, provided the network architecture can process them effectively. We find that the widely used Impala encoder, which flattens spatial features into a vector, suffers from quadratic parameter growth as resolution increases and fails to leverage the additional visual detail. Replacing this operation with global average pooling, as in the Impoola architecture, decouples parameter count from resolution and yields consistent improvements across resolutions and network widths - at their respective best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raphajaner/procgen-hd
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.