Puzzle Solving without Search or Human Knowledge: An Unnatural Language   Approach

David Noever; Ryerson Burdick

arXiv:2109.02797·cs.LG·September 8, 2021

Puzzle Solving without Search or Human Knowledge: An Unnatural Language Approach

David Noever, Ryerson Burdick

PDF

Open Access

TL;DR

This paper demonstrates that GPT-2 can learn to solve complex puzzles like mazes, Rubik's Cube, and Sudoku solely from text archives, without search or human heuristics, by fine-tuning on solved game data.

Contribution

It introduces a novel approach of using transformer models trained on text-archived game solutions to solve puzzles without search or domain-specific heuristics.

Findings

01

Transformer models can learn puzzle-solving strategies from text archives.

02

The method achieves solutions in environments with sparse rewards.

03

It bypasses traditional search and heuristic methods for puzzle solving.

Abstract

The application of Generative Pre-trained Transformer (GPT-2) to learn text-archived game notation provides a model environment for exploring sparse reward gameplay. The transformer architecture proves amenable to training on solved text archives describing mazes, Rubik's Cube, and Sudoku solvers. The method benefits from fine-tuning the transformer architecture to visualize plausible strategies derived outside any guidance from human heuristics or domain expertise. The large search space ( $> 1 0^{19}$ ) for the games provides a puzzle environment in which the solution has few intermediate rewards and a final move that solves the challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Evolutionary Algorithms and Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Softmax · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Layer Normalization