World Models for Policy Refinement in StarCraft II

Yixin Zhang; Ziyi Wang; Yiming Rong; Haoxi Wang; Jinling Jiang; Shuang Xu; Haoran Wu; Shiyu Zhou; Bo Xu

arXiv:2602.14857·cs.AI·February 17, 2026

World Models for Policy Refinement in StarCraft II

Yixin Zhang, Ziyi Wang, Yiming Rong, Haoxi Wang, Jinling Jiang, Shuang Xu, Haoran Wu, Shiyu Zhou, Bo Xu

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces StarWM, a world model for StarCraft II that predicts future observations, enabling a decision system that significantly improves AI performance in complex, partially observable environments.

Contribution

We develop StarWM, the first world model for SC2 with structured textual representations and a new dataset, enhancing policy refinement through a generate-simulate-refine loop.

Findings

01

StarWM achieves nearly 60% better resource prediction accuracy.

02

StarWM improves self-side macro-situation consistency.

03

StarWM-Agent increases win rates by up to 30% against advanced AI levels.

Abstract

Large Language Models (LLMs) have recently shown strong reasoning and generalization capabilities, motivating their use as decision-making policies in complex environments. StarCraft II (SC2), with its massive state-action space and partial observability, is a challenging testbed. However, existing LLM-based SC2 agents primarily focus on improving the policy itself and overlook integrating a learnable, action-conditioned transition model into the decision loop. To bridge this gap, we propose StarWM, the first world model for SC2 that predicts future observations under partial observability. To facilitate learning SC2's hybrid dynamics, we introduce a structured textual representation that factorizes observations into five semantic modules, and construct SC2-Dynamics-50k, the first instruction-tuning dataset for SC2 dynamics prediction. We further develop a multi-dimensional offline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
yxzhang2024/StarWM
model· 8 dl
8 dl

Datasets

yxzhang2024/SC2-Dynamics-50K
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning