Generative Perception of Shape and Material from Differential Motion
Xinran Nicole Han, Ko Nishino, Todd Zickler

TL;DR
This paper introduces a novel generative model that predicts shape and material attributes of objects from short videos, effectively resolving ambiguities through differential motion and improving visual reasoning.
Contribution
It presents a new conditional denoising-diffusion model trained on synthetic videos that generates disentangled shape and material maps, capturing ambiguities and refining predictions with motion.
Findings
Model produces diverse, multimodal predictions for static views.
Object motion leads to more accurate shape and material estimates.
High-quality predictions are achieved for real-world objects.
Abstract
Perceiving the shape and material of an object from a single image is inherently ambiguous, especially when lighting is unknown and unconstrained. Despite this, humans can often disentangle shape and material, and when they are uncertain, they often move their head slightly or rotate the object to help resolve the ambiguities. Inspired by this behavior, we introduce a novel conditional denoising-diffusion model that generates samples of shape-and-material maps from a short video of an object undergoing differential motions. Our parameter-efficient architecture allows training directly in pixel-space, and it generates many disentangled attributes of an object simultaneously. Trained on a modest number of synthetic object-motion videos with supervision on shape and material, the model exhibits compelling emergent behavior: For static observations, it produces diverse, multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing and 3D Reconstruction · Manufacturing Process and Optimization · 3D Shape Modeling and Analysis
