Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games

Jeppe Theiss Kristensen; Paolo Burelli

arXiv:2007.01542·cs.AI·July 9, 2021

Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games

Jeppe Theiss Kristensen, Paolo Burelli

PDF

TL;DR

This paper explores strategies to enhance the reliability and generalization of Proximal Policy Optimization (PPO) in training AI agents for casual mobile puzzle games, demonstrating improved stability in real-world testing.

Contribution

It evaluates and identifies strategies to improve PPO's stability and generalization specifically for mobile puzzle game AI training.

Findings

01

Identified conditions causing PPO failure in training and testing.

02

Developed strategies that improve PPO stability in game environments.

03

Validated strategies on Lily's Garden with positive results.

Abstract

While traditionally a labour intensive task, the testing of game content is progressively becoming more automated. Among the many directions in which this automation is taking shape, automatic play-testing is one of the most promising thanks also to advancements of many supervised and reinforcement learning (RL) algorithms. However these type of algorithms, while extremely powerful, often suffer in production environments due to issues with reliability and transparency in their training and usage. In this research work we are investigating and evaluating strategies to apply the popular RL method Proximal Policy Optimization (PPO) in a casual mobile puzzle game with a specific focus on improving its reliability in training and generalization during game playing. We have implemented and tested a number of different strategies against a real-world mobile puzzle game (Lily's Garden from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.