RobotDesignGPT: Automated Robot Design Synthesis using Vision Language Models
Nitish Sontakke, K. Niranjan Kumar, Sehoon Ha

TL;DR
RobotDesignGPT automates robot design synthesis by leveraging vision-language models, enabling the creation of visually appealing and kinematically valid robots from simple prompts and reference images, reducing manual effort.
Contribution
The paper introduces a novel framework that uses large pre-trained vision-language models for automated robot design, incorporating a visual feedback mechanism to enhance quality.
Findings
Designs inspired by nature, including legged and flying robots.
Improved design quality with visual feedback approach.
Validated through ablation and user studies.
Abstract
Robot design is a nontrivial process that involves careful consideration of multiple criteria, including user specifications, kinematic structures, and visual appearance. Therefore, the design process often relies heavily on domain expertise and significant human effort. The majority of current methods are rule-based, requiring the specification of a grammar or a set of primitive components and modules that can be composed to create a design. We propose a novel automated robot design framework, RobotDesignGPT, that leverages the general knowledge and reasoning capabilities of large pre-trained vision-language models to automate the robot design synthesis process. Our framework synthesizes an initial robot design from a simple user prompt and a reference image. Our novel visual feedback approach allows us to greatly improve the design quality and reduce unnecessary manual feedback. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
