TL;DR
This paper introduces LINO UniPS, a unified feature representation for universal photometric stereo that effectively decouples lighting from normals and preserves high-frequency details, achieving state-of-the-art results.
Contribution
The paper proposes a novel framework with Light Register Tokens, Interleaved Attention, wavelet-based architecture, and a new dataset, advancing the accuracy and generalization of photometric stereo methods.
Findings
State-of-the-art performance on public benchmarks
Enhanced generalization to real-world materials
Improved preservation of geometric details
Abstract
Universal photometric stereo (PS) is defined by two factors: it must (i) operate under arbitrary, unknown lighting conditions and (ii) avoid reliance on specific illumination models. Despite progress (e.g., SDM UniPS), two challenges remain. First, current encoders cannot guarantee that illumination and normal information are decoupled. To enforce decoupling, we introduce LINO UniPS with two key components: (i) Light Register Tokens with light alignment supervision to aggregate point, direction, and environment lights; (ii) Interleaved Attention Block featuring global cross-image attention that takes all lighting conditions together so the encoder can factor out lighting while retaining normal-related evidence. Second, high-frequency geometric details are easily lost. We address this with (i) a Wavelet-based Dual-branch Architecture and (ii) a Normal-gradient Perception Loss. These…
Peer Reviews
Decision·ICLR 2026 Poster
+ The introduction of three types of Light Register Tokens for aggregration of illumination information of three different illumination types sounds logical and novel. It helps improve the decoupling of illumination from normal features. Its effectiveness has been demonstrarted in ablation study. + The proposed light alignment supervision sounds logical and novel. It helps Light Register Tokens learn to capture the respective illumination information. Its effectiveness has been demonstrated in a
- The figures and captions need further improvement. For instance, the pipeline in fig. 2 is rather complicated and difficult to understand. It does not match well with the detailed description of the modules. What do the different colors represent? In fig. 3, it is not clear how to interpret the attention maps for the different Light Register Tokens. More detailed discussions are needed to better understand how the figures demonstrate the effectiveness of the different Light Register Tokens.
The proposed method demonstrates solid performance on both real and synthetic datasets. The ablation studies are comprehensive and well-conducted.
- Line 64 seems quite contradictory, should it be normal features instead? - Could the authors provide a more physically grounded explanation for explaining the effect of feature similarity? The current discussion mainly relies on experiments illustration. - Could the Light Token also be used for light source estimation? If so, how accurate would it be? - It would be much better to discuss and acknowledge PS work under general setup.
1. The proposed method achieves good results on two benchmark datasets, DiLiGenT and LUCES. 2. The paper introduces a new dataset, PS-Verse, which is proved to be able to help achieve better performance for the same method from Table 2.
1. The clarity of the paper could be further improved. In general I can understand the idea of the paper but there are several key aspects that I am confused with. Please see Questions section below. Also there are not metrics introduction for table 4 and table 5 in their table descriptions. 2. The test scenes only have one object per-scene. I wonder if the method can handle more complicated scenes? 3. My biggest concern is the comparison results with other methods. (1) In Uni MS-PS (https:/
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
