Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
Ruiyu Wang, Yu Yuan, Shizhao Sun, Jiang Bian

TL;DR
This paper introduces CADFusion, a novel framework that enhances text-to-CAD model generation by integrating visual feedback into large language models through alternating training stages, improving both logical coherence and visual quality.
Contribution
The paper proposes a new training framework that combines parametric sequence learning with visual feedback, addressing multimodal aspects of CAD model generation.
Findings
CADFusion outperforms existing methods in qualitative assessments.
Quantitative results show significant accuracy improvements.
Alternating training stages effectively balance logical and visual learning.
Abstract
Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
