When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS

Zirui Zhao; Boye Niu; Harold Soh; David Hsu; Wee Sun Lee

arXiv:2512.01242·cs.CV·May 14, 2026

When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS

Zirui Zhao, Boye Niu, Harold Soh, David Hsu, Wee Sun Lee

PDF

TL;DR

This paper identifies the limitations of diffusion models in constrained generation tasks and proposes a sequential autoregressive approach with reinforcement learning and MCTS to improve feasibility.

Contribution

It introduces a reformulation of constrained generation as discrete autoregressive sequential generation, addressing diffusion models' failure modes.

Findings

01

Diffusion models struggle with low-dimensional, constrained solution spaces.

02

Reinforcement learning improves the feasibility and success rate of constrained generation.

03

Monte Carlo tree search helps evaluate the value of look-ahead in shrinking feasible regions.

Abstract

Data-driven generative models excel in language and vision, but diffusion models often fail in constrained planning and design tasks, exhibiting severe constraint violations in engineering inverse design, molecular generation, multi-robot planning, and floorplan/scene synthesis even with projection or guidance. Such tasks combine hard-to-specify semantic goals with strict geometric or physical constraints (e.g., non-overlap, connectivity), yielding feasible solutions that lie on low-dimensional, small, and sometimes disconnected regions of the output space. This paper studies the failure mode through tangram generation from language, where seven fixed shapes must form a text-described silhouette while remaining connected and non-overlapping, and a simplified rectangle composition task with a learned bounding-box constraint. We find diffusion models struggle to satisfy constraints,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.