GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image

Yinghui Wang; Xinyu Zhang; Peng Du

arXiv:2510.17157·cs.CV·October 21, 2025

GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image

Yinghui Wang, Xinyu Zhang, Peng Du

PDF

Open Access

TL;DR

GACO-CAD is a two-stage framework that enhances 3D geometry inference from a single image and produces more concise CAD models by leveraging geometric priors and reinforcement learning.

Contribution

It introduces a novel multi-modal fine-tuning and reinforcement learning approach to improve geometric accuracy and conciseness in CAD model generation from a single image.

Findings

01

Outperforms existing methods on DeepCAD and Fusion360 datasets.

02

Achieves higher code validity and geometric accuracy.

03

Produces more compact and less redundant models.

Abstract

Generating editable, parametric CAD models from a single image holds great potential to lower the barriers of industrial concept design. However, current multi-modal large language models (MLLMs) still struggle with accurately inferring 3D geometry from 2D images due to limited spatial reasoning capabilities. We address this limitation by introducing GACO-CAD, a novel two-stage post-training framework. It is designed to achieve a joint objective: simultaneously improving the geometric accuracy of the generated CAD models and encouraging the use of more concise modeling procedures. First, during supervised fine-tuning, we leverage depth and surface normal maps as dense geometric priors, combining them with the RGB image to form a multi-channel input. In the context of single-view reconstruction, these priors provide complementary spatial cues that help the MLLM more reliably recover 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization