IDT: A Physically Grounded Transformer for Feed-Forward Multi-View Intrinsic Decomposition
Kang Du, Yirui Guan, Zeyu Wang

TL;DR
This paper introduces IDT, a transformer-based framework for multi-view intrinsic image decomposition that ensures view consistency and physically interpretable results without iterative sampling.
Contribution
IDT is the first physically grounded, transformer-based method that jointly reasons over multiple views for consistent intrinsic decomposition in a single pass.
Findings
IDT produces more coherent diffuse shading and specular components.
IDT achieves better multi-view consistency than previous methods.
IDT effectively separates material and illumination effects.
Abstract
Intrinsic image decomposition is fundamental for visual understanding, as RGB images entangle material properties, illumination, and view-dependent effects. Recent diffusion-based methods have achieved strong results for single-view intrinsic decomposition; however, extending these approaches to multi-view settings remains challenging, often leading to severe view inconsistency. We propose \textbf{Intrinsic Decomposition Transformer (IDT)}, a feed-forward framework for multi-view intrinsic image decomposition. By leveraging transformer-based attention to jointly reason over multiple input images, IDT produces view-consistent intrinsic factors in a single forward pass, without iterative generative sampling. IDT adopts a physically grounded image formation model that explicitly decomposes images into diffuse reflectance, diffuse shading, and specular shading. This structured factorization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis
