3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs
Noor Ahmed, Cameron Braunstein, Steffen Eger, Eddy Ilg

TL;DR
3DFroMLLM introduces a framework that generates 3D object prototypes from pretrained multimodal LLMs without extra training, enhancing 3D understanding and improving downstream vision-language tasks.
Contribution
The paper presents a novel agentic pipeline for 3D prototype generation directly from MLLMs, eliminating the need for additional data or instructions.
Findings
Generated 3D prototypes improve image classification pretraining by 15%.
Prototypes enhance fine-grained vision-language models, boosting CLIP part segmentation accuracy by 55%.
Framework operates without extra training data or user instructions.
Abstract
Recent Multi-Modal Large Language Models (MLLMs) have demonstrated strong capabilities in learning joint representations from text and images. However, their spatial reasoning remains limited. We introduce 3DFroMLLM, a novel framework that enables the generation of 3D object prototypes directly from MLLMs, including geometry and part labels. Our pipeline is agentic, comprising a designer, coder, and visual inspector operating in a refinement loop. Notably, our approach requires no additional training data or detailed user instructions. Building on prior work in 2D generation, we demonstrate that rendered images produced by our framework can be effectively used for image classification pretraining tasks and outperforms previous methods by 15%. As a compelling real-world use case, we show that the generated prototypes can be leveraged to improve fine-grained vision-language models by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Advanced Neural Network Applications
