Fast and Data Efficient Reinforcement Learning from Pixels via   Non-Parametric Value Approximation

Alexander Long; Alan Blair; Herke van Hoof

arXiv:2203.03078·cs.LG·March 8, 2022

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Alexander Long, Alan Blair, Herke van Hoof

PDF

Open Access 1 Video

TL;DR

NAIT is a highly efficient reinforcement learning algorithm for pixel-based environments that combines non-parametric value approximation with simple exploration, achieving competitive results with significantly reduced computation time.

Contribution

The paper introduces NAIT, a novel non-parametric RL method that enables fast, data-efficient learning from pixels with stable reward incorporation during episodes.

Findings

01

NAIT achieves over 100x speedup in wall-time compared to existing methods.

02

NAIT performs competitively on ATARI100k benchmarks.

03

The approach is simple yet effective for discrete action, pixel-based RL tasks.

Abstract

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation· underline

Taxonomy

TopicsSports Analytics and Performance · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing