Planning with Pixels in (Almost) Real Time

Wilmer Bandres; Blai Bonet; Hector Geffner

arXiv:1801.03354·cs.AI·January 11, 2018

Planning with Pixels in (Almost) Real Time

Wilmer Bandres, Blai Bonet, Hector Geffner

PDF

TL;DR

This paper demonstrates that width-based planning directly from pixel inputs can achieve human-competitive scores in Atari games in nearly real-time, without training, by adapting the IW(k) algorithm for visual states.

Contribution

It introduces a pixel-based planning approach that matches human and learning method performance in Atari games, with a novel episodic rollout version of IW(k) enabling real-time results.

Findings

01

Planning from pixels yields competitive Atari scores.

02

The episodic IW(k) algorithm enables near real-time performance.

03

No training required for high-quality planning results.

Abstract

Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by developing an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.