Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis
Rui Zhou, Yanxia Zhang, Chenyang Yuan, Frank Permenter, Nikos, Arechiga, Matt Klenk, Faez Ahmed

TL;DR
This paper presents Parametric-ControlNet, a multimodal control framework for foundation models that improves engineering design synthesis by integrating parametric, image, and text inputs for precise and diverse generative outputs.
Contribution
It introduces a novel multimodal control method combining parametric, image, and text modalities with a diffusion-based model for engineering design synthesis.
Findings
Enhanced control over design generation using multimodal inputs
Improved precision and diversity in engineering design outputs
Effective integration of parametric, visual, and textual data
Abstract
This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. Firstly, it handles both partial and complete parametric inputs using a diffusion model that acts as a design autocomplete co-pilot, coupled with a parametric encoder to process the information. Secondly, the model utilizes assembly graphs to systematically assemble input component images, which are then processed through a component encoder to capture essential visual data. Thirdly, textual descriptions are integrated via CLIP encoding, ensuring a comprehensive interpretation of design intent. These diverse inputs are synthesized through a multimodal fusion technique, creating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBIM and Construction Integration
MethodsDiffusion · Contrastive Language-Image Pre-training
