React-ing to Grace Hopper 200: Five Open-Weights Coding Models, One React Native App, One GH200, One Weekend

Alex Potanin

arXiv:2604.17187·cs.SE·April 21, 2026

React-ing to Grace Hopper 200: Five Open-Weights Coding Models, One React Native App, One GH200, One Weekend

Alex Potanin

PDF

TL;DR

This paper evaluates five open-weights coding models on a React Native app generation task, revealing that lower-ranked models can outperform higher-ranked ones and uncovering deployment insights and hardware efficiency trends.

Contribution

It provides a comprehensive evaluation of open-weights coding models on a practical app generation task and uncovers novel deployment and hardware efficiency insights.

Findings

01

Kimi-K2.5 with aggressive quantization outperforms higher SWE-Bench models.

02

Default temperature=0 causes sampling hangs in coding tools.

03

Web-platform adaptation of mobile APIs is a universal training-data gap.

Abstract

We evaluate five state-of-the-art open-weights coding language models -- Kimi-K2.5 (at Q3 and Q4 quantizations), GLM-5.1, Qwen3-Coder-480B, and DeepSeek-V3.2 -- on a single multi-file React Native application generation task on NVIDIA GH200 576 GB hardware. The task specifies authentication, per-user per-day counting, and web compatibility, and is evaluated on whether the generated project runs out-of-the-box and on feature-level correctness. We find that SWE-Bench rankings do not predict task performance: Kimi-K2.5 at aggressive 3-bit quantization (UD-Q3_K_XL, 480 GB) produces the most complete and specification-compliant output, outranking models with substantially higher SWE-Bench Pro scores. We document three novel deployment findings: (1) default temperature=0 in coding tools causes sampling hangs with reasoning-model architectures, (2) reasoning-model thinking traces can leak…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.