ReasonX: MLLM-Guided Intrinsic Image Decomposition
Alara Dirik, Tuanfeng Wang, Duygu Ceylan, Stefanos Zafeiriou, Anna Fr\"uhst\"uck

TL;DR
ReasonX introduces a novel framework that uses multimodal large language models as perceptual judges to improve intrinsic image decomposition, especially on real-world images, by leveraging relational comparisons as rewards for fine-tuning models.
Contribution
It proposes a model-agnostic, MLLM-guided supervision method that enhances intrinsic decomposition models without requiring labeled data, bridging the gap between synthetic training and real-world application.
Findings
9-25% WHDR reduction on IIW albedo
Up to 46% depth accuracy gains on ETH3D
Effective across multiple architectures and modalities
Abstract
Intrinsic image decomposition aims to separate images into physical components such as albedo, depth, normals, and illumination. While recent diffusion- and transformer-based models benefit from paired supervision from synthetic datasets, their generalization to diverse, real-world scenarios remains challenging. We propose ReasonX, a novel framework that leverages a multimodal large language model (MLLM) as a perceptual judge providing relative intrinsic comparisons, and uses these comparisons as GRPO rewards for fine-tuning intrinsic decomposition models on unlabeled, in-the-wild images. Unlike RL methods for generative models, our framework aligns conditional intrinsic predictors by rewarding agreement between the judge's relational assessments and analytically derived relations from the model's outputs. ReasonX is model-agnostic and can be applied to different intrinsic predictors.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
