Subsecond 3D Mesh Generation for Robot Manipulation
Qian Wang, Omar Abdellall, Tony Gao, Xiatao Sun, Daniel Rakita

TL;DR
This paper presents a fast, end-to-end system that generates high-quality, contextually grounded 3D meshes from a single RGB-D image in under one second, advancing real-time robotic perception and manipulation.
Contribution
It introduces a novel pipeline combining open-vocabulary segmentation, diffusion-based mesh generation, and point cloud registration for rapid, accurate 3D mesh creation in robotics.
Findings
Meshes generated in under one second.
Effective in real-world manipulation tasks.
Enables practical on-demand 3D perception for robots.
Abstract
3D meshes are a fundamental representation widely used in computer science and engineering. In robotics, they are particularly valuable because they capture objects in a form that aligns directly with how robots interact with the physical world, enabling core capabilities such as predicting stable grasps, detecting collisions, and simulating dynamics. Although automatic 3D mesh generation methods have shown promising progress in recent years, potentially offering a path toward real-time robot perception, two critical challenges remain. First, generating high-fidelity meshes is prohibitively slow for real-time use, often requiring tens of seconds per object. Second, mesh generation by itself is insufficient. In robotics, a mesh must be contextually grounded, i.e., correctly segmented from the scene and registered with the proper scale and pose. Additionally, unless these contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
