SMAC-R1: The Emergence of Intelligence in Decision-Making Tasks
Yue Deng, Weiyu Ma, Yuxin Fan, Ruyi Song, Yin Zhang, Haifeng Zhang,, Jian Zhao

TL;DR
This paper introduces SMAC-R1, a novel approach that uses large language models to generate interpretable decision trees for multi-agent reinforcement learning in StarCraft environments, achieving high transferability and minimal exploration.
Contribution
The paper presents a new method combining LLMs and decision trees for MARL, with a pipeline that includes self-reflection and fine-tuning, improving interpretability and transferability of policies.
Findings
High-quality, interpretable decision trees generated
Strong transferability to new environments demonstrated
Minimal environmental exploration required
Abstract
StarCraft Multi-Agent Challenge (SMAC) has been one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL), where the specific task is to control a set number of allied units to defeat enemy forces. Traditional MARL algorithms often require interacting with the environment for millions of steps to train a parametric model, of which the resulting policies are typically non-interpretable with weak transferability. In this paper, we introduce SMAC-R1 which is based on the Qwen2.5-7B-Base LLM distilled from DeepSeek-Coder-v2.5-236B. Similar to online reinforcement learning after behavior cloning in offline learning process, in our pipeline, agents leverage the DeepSeek LLM to generate decision tree code by providing task descriptions, and the agents are further self-reflected using feedback from the rewards provided by the environment. Based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Machine Learning and Data Classification · Data Mining Algorithms and Applications
MethodsSparse Evolutionary Training
