PlayCoder: Making LLM-Generated GUI Code Playable

Zhiyuan Peng; Wei Tao; Xin Yin; Chenhao Ying; Yuan Luo; Yiwen Guo

arXiv:2604.19742·cs.SE·April 22, 2026

PlayCoder: Making LLM-Generated GUI Code Playable

Zhiyuan Peng, Wei Tao, Xin Yin, Chenhao Ying, Yuan Luo, Yiwen Guo

PDF

1 Repo

TL;DR

PlayCoder introduces a multi-agent framework that enhances the generation and correctness of GUI applications by iteratively repairing code, addressing the limitations of current LLMs in producing logically consistent GUI programs.

Contribution

The paper presents PlayCoder, a novel framework that improves LLM-generated GUI code through iterative repair and evaluation, supported by a new benchmark and evaluation metric.

Findings

01

LLMs achieve near-zero Play@3 on GUI tasks without repair.

02

PlayCoder significantly improves correctness, reaching up to 38.1% Exec@3.

03

Traditional metrics miss silent logic bugs that PlayCoder can detect and fix.

Abstract

Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interactive, event-driven, and require correct state transitions across sequences of user actions. Their evaluation therefore should consider interaction flows and UI logic rather than only pass/fail outcomes. To study this problem, we introduce PlayEval, a repository-aware benchmark built from 43 multilingual GUI applications in Python, TypeScript, and JavaScript. Unlike prior GUI benchmarks that are difficult to adapt to desktop environments, PlayEval covers six major GUI application categories and directly supports code-generation evaluation. We further propose Play@k, a metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent/PlayCoder
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.