Parametric-ControlNet: Multimodal Control in Foundation Models for   Precise Engineering Design Synthesis

Rui Zhou; Yanxia Zhang; Chenyang Yuan; Frank Permenter; Nikos; Arechiga; Matt Klenk; Faez Ahmed

arXiv:2412.04707·cs.AI·December 9, 2024

Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

Rui Zhou, Yanxia Zhang, Chenyang Yuan, Frank Permenter, Nikos, Arechiga, Matt Klenk, Faez Ahmed

PDF

Open Access

TL;DR

This paper presents Parametric-ControlNet, a multimodal control framework for foundation models that improves engineering design synthesis by integrating parametric, image, and text inputs for precise and diverse generative outputs.

Contribution

It introduces a novel multimodal control method combining parametric, image, and text modalities with a diffusion-based model for engineering design synthesis.

Findings

01

Enhanced control over design generation using multimodal inputs

02

Improved precision and diversity in engineering design outputs

03

Effective integration of parametric, visual, and textual data

Abstract

This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. Firstly, it handles both partial and complete parametric inputs using a diffusion model that acts as a design autocomplete co-pilot, coupled with a parametric encoder to process the information. Secondly, the model utilizes assembly graphs to systematically assemble input component images, which are then processed through a component encoder to capture essential visual data. Thirdly, textual descriptions are integrated via CLIP encoding, ensuring a comprehensive interpretation of design intent. These diverse inputs are synthesized through a multimodal fusion technique, creating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBIM and Construction Integration

MethodsDiffusion · Contrastive Language-Image Pre-training