Multistable Shape from Shading Emerges from Patch Diffusion
Xinran Nicole Han, Todd Zickler, Ko Nishino

TL;DR
This paper presents a diffusion-based model that reconstructs multimodal shape distributions from shading images, capturing human-like multistable perception and ambiguity in shape inference.
Contribution
It introduces a small denoising diffusion model that generates multimodal surface normal distributions from image patches, aligning with human perception of shape ambiguity.
Findings
Model produces multimodal shape distributions for ambiguous images.
Model generates accurate shape estimates for clear, object-like images.
Multistable shape explanations emerge despite small parameter count.
Abstract
Models for inferring monocular shape of surfaces with diffuse reflection -- shape from shading -- ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) types that are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight distributions around single modes, which prevent them from capturing these effects. We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image, which aligns with the human experience of multistable perception. We train a small denoising diffusion process to generate surface normal fields from patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Shape Modeling and Analysis · Image Retrieval and Classification Techniques
MethodsDiffusion
