Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

Shiyu Tan; Zixuan Zhao; Hao Gao; Zhiheng Chen; Xiaolong Yin; Enya Shen

arXiv:2605.13293·cs.CV·May 14, 2026

Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

Shiyu Tan, Zixuan Zhao, Hao Gao, Zhiheng Chen, Xiaolong Yin, Enya Shen

PDF

TL;DR

Img2CADSeq is a novel multi-stage pipeline that converts single-view images into high-quality CAD boundary representations using a hierarchical codebook and contrastive learning, enabling direct use in CAD software.

Contribution

The paper introduces a hierarchical codebook and a contrastive learning framework to improve image-to-CAD sequence generation, supported by new datasets and outperforming existing methods.

Findings

01

Outperforms state-of-the-art methods in CAD sequence generation.

02

Produces standard STEP files compatible with commercial CAD software.

03

Demonstrates robust adaptation to industrial domains.

Abstract

Boundary Representation (BRep) is the standard format for Computer-Aided Design (CAD), yet reconstructing high-quality BReps from single-view images remains challenging due to the complexity of topological constraints and operation sequences. We present Img2CADSeq, a multi-stage pipeline that overcomes these limitations by encoding CAD sequences into a three-level hierarchical codebook. Guided by an importance prioritization, this strategy values profiles over details, compressing long sequences into a stable discrete latent space. To bridge the modality gap, we leverage a coarse-to-fine point cloud intermediate, aligning 2D visual features with 3D CAD sequences via contrastive learning to condition a VQ-Diffusion model. Supported by newly introduced CAD-220K and PrintCAD datasets, our approach ensures robust industrial domain adaptation. Extensive experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.