VLMgineer: Vision Language Models as Robotic Toolsmiths

George Jiayuan Gao; Tianyu Li; Junyao Shi; Yihan Li; Zizhe Zhang; Nadia Figueroa; Dinesh Jayaraman

arXiv:2507.12644·cs.RO·July 18, 2025

VLMgineer: Vision Language Models as Robotic Toolsmiths

George Jiayuan Gao, Tianyu Li, Junyao Shi, Yihan Li, Zizhe Zhang, Nadia Figueroa, Dinesh Jayaraman

PDF

Open Access

TL;DR

VLMgineer leverages vision language models and evolutionary search to automatically co-design tools and action plans, enabling robots to solve manipulation tasks more effectively and creatively.

Contribution

This work introduces VLMgineer, a novel framework combining VLMs and evolutionary search for automated tool design and use in robotics.

Findings

01

VLMgineer outperforms human-designed tools in manipulation tasks.

02

The framework discovers innovative tools and policies for diverse scenarios.

03

It transforms complex robotics problems into simple, effective executions.

Abstract

Tool design and use reflect the ability to understand and manipulate the physical world through creativity, planning, and foresight. As such, these capabilities are often regarded as measurable indicators of intelligence across biological species. While much of today's research on robotic intelligence focuses on generating better controllers, inventing smarter tools offers a complementary form of physical intelligence: shifting the onus of problem-solving onto the tool's design. Given the vast and impressive common-sense, reasoning, and creative capabilities of today's foundation models, we investigate whether these models can provide useful priors to automatically design and effectively wield such tools? We present VLMgineer, a framework that harnesses the code generation abilities of vision language models (VLMs) together with evolutionary search to iteratively co-design physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Automated Systems