Style-Preserving Policy Optimization for Game Agents

Lingfeng Li; Yunlong Lu; Yongyi Wang; Wenxin Li

arXiv:2506.16995·cs.AI·September 23, 2025

Style-Preserving Policy Optimization for Game Agents

Lingfeng Li, Yunlong Lu, Yongyi Wang, Wenxin Li

PDF

TL;DR

This paper introduces MPPO, a reinforcement learning method that enhances game agents' proficiency while maintaining their diverse play styles, leading to more engaging and replayable gaming experiences.

Contribution

The paper proposes MPPO, a novel policy optimization technique that improves agent proficiency without sacrificing style diversity, bridging the gap between RL and evolution-based methods.

Findings

01

MPPO achieves proficiency comparable or superior to pure online algorithms.

02

MPPO preserves the diverse play styles of demonstrator agents.

03

Empirical results validate MPPO's effectiveness across various environments.

Abstract

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.