Learn2Fold: Structured Origami Generation with World Model Planning
Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang

TL;DR
Learn2Fold is a neuro-symbolic framework that generates valid origami folding sequences from text by combining language models with a differentiable physical simulator in a planning loop.
Contribution
It introduces a novel approach that decouples semantic generation from physical verification for origami folding from natural language descriptions.
Findings
Successfully generates complex origami folds from text prompts.
Ensures physical validity through a learned graph-structured world model.
Outperforms existing methods in producing feasible folding sequences.
Abstract
The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
