Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

Xiaojie Xu; Zongyuan Li; Chang Lu; Runnan Qi; Yanan Ni; Lumin Jiang; Xiangbei Liu; Xuebo Zhang; Yongchun Fang; Kuihua Huang; Xian Guo; Zhanghua Wu; Zhenya Li

arXiv:2502.13388·cs.AI·April 13, 2026

Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

Xiaojie Xu, Zongyuan Li, Chang Lu, Runnan Qi, Yanan Ni, Lumin Jiang, Xiangbei Liu, Xuebo Zhang, Yongchun Fang, Kuihua Huang, Xian Guo, Zhanghua Wu, Zhenya Li

PDF

TL;DR

This paper introduces the Reflection of Episodes (ROE) framework for training AI in StarCraft II, leveraging expert and self-experience to improve decision-making and outperform existing bots.

Contribution

The paper presents a novel ROE framework that combines expert and self-reflection for reinforcement learning in complex RTS environments.

Findings

01

ROE outperforms baseline in Very Hard difficulty in TextStarCraft II.

02

The framework effectively utilizes keyframe selection for decision making.

03

Analysis confirms the method's ability to enhance LLM learning in complex tasks.

Abstract

StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.