Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping

Octavio Arriaga; Proneet Sharma; Jichen Guo; Marc Otto; Siddhant Kadwe; Rebecca Adam

arXiv:2602.05029·cs.RO·February 6, 2026

Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping

Octavio Arriaga, Proneet Sharma, Jichen Guo, Marc Otto, Siddhant Kadwe, Rebecca Adam

PDF

Open Access

TL;DR

This paper presents a differentiable neuro-graphics model that enables zero-shot scene reconstruction and robot grasping from a single RGBD image, eliminating the need for extensive datasets or test-time sampling.

Contribution

It introduces a novel physics-based differentiable rendering approach combining neural foundation models for zero-shot scene understanding and grasping.

Findings

01

Outperforms existing algorithms in few-shot pose estimation

02

Accurately reconstructs scenes from a single RGBD image

03

Enables zero-shot grasping in novel environments

Abstract

Operating effectively in novel real-world environments requires robotic systems to estimate and interact with previously unseen objects. Current state-of-the-art models address this challenge by using large amounts of training data and test-time samples to build black-box scene representations. In this work, we introduce a differentiable neuro-graphics model that combines neural foundation models with physics-based differentiable rendering to perform zero-shot scene reconstruction and robot grasping without relying on any additional 3D data or test-time samples. Our model solves a series of constrained optimization problems to estimate physically consistent scene parameters, such as meshes, lighting conditions, material properties, and 6D poses of previously unseen objects from a single RGBD image and bounding boxes. We evaluated our approach on standard model-free few-shot benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Human Pose and Action Recognition