CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction

Jisu Shin; Richard Shaw; Seunghyun Shin; Zhensong Zhang; Hae-Gon Jeon; Eduardo Perez-Pellitero

arXiv:2507.15748·cs.CV·October 1, 2025

CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction

Jisu Shin, Richard Shaw, Seunghyun Shin, Zhensong Zhang, Hae-Gon Jeon, Eduardo Perez-Pellitero

PDF

3 Reviews

TL;DR

This paper introduces CHROMA, a fast, generalizable method for multi-view appearance harmonization that corrects photometric inconsistencies across views using bilateral grid prediction, enhancing 3D reconstruction quality.

Contribution

CHROMA presents a novel, feed-forward approach predicting bilateral grids for multi-view consistency, enabling efficient large-scale harmonization without scene-specific retraining.

Findings

01

Outperforms scene-specific optimization methods in reconstruction quality

02

Processes hundreds of frames in a single step for efficiency

03

Improves generalization to real-world variations

Abstract

Modern camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction, which, while beneficial individually, often introduce photometric inconsistencies across views. These appearance variations violate multi-view consistency and degrade novel view synthesis. Joint optimization of scene-specific representations and per-image appearance embeddings has been proposed to address this issue, but with increased computational complexity and slower training. In this work, we propose a generalizable, feed-forward approach that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner. Our model processes hundreds of frames in a single step, enabling efficient large-scale harmonization, and seamlessly integrates into downstream 3D reconstruction models, providing cross-scene…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The paper is well written and theoretically detailed. The overall evaluation presented in the paper is mostly thorough and shows favourable results from the proposed approach. It is particularly interesting to see considerable improvement in 3D reconstruction (Fig 4). The theoretical contributions in the paper are more or less incremental when viewed in the context of existing literature, however, they seem to be put together in a coherent manner in the proposed approach. Bilateral grids for pe

Weaknesses

One of the main challenges in photometric consistency across multiple views is in being able to handle specularities - which is where most methods fail remarkably. While specular surfaces may be significantly challenging to address, varying specularities across views from bright light sources and low dynamic range of sensors is certainly applicable to the proposed approach. I would expect the approach to at least discuss and/or present the performance on some examples. The image in column 3 of

Reviewer 02Rating 4Confidence 5

Strengths

1. The method is technically sound. The bilateral grid is well suited for spatially adaptive and edge-preserving color correction. 2. The experiments are thorough and well validated.

Weaknesses

1. The authors train the model on DL3DV with handcrafted ISP variations and also test on DL3DV with the same synthetic setup. This limits the significance of the results, since a feed-forward model will naturally perform well under similar handcrafted conditions. 2. In Table 1, the improvement over Luminance-GS on the LOM and BilaRF datasets appears small.

Reviewer 03Rating 6Confidence 4

Strengths

1. The proposed feed-forward framework is novel and well-motivated, effectively improving illumination and color consistency across multiple views and achieving state-of-the-art performance compared to existing methods. 2. This paper is well-written and easy to read.

Weaknesses

1. The proposed method relies on the assumption that at least one reference view is reliable. However, in real-world scenarios where all input views contain significant artifacts or inconsistencies, the effectiveness of the proposed framework could be severely degraded. 2. The method models inter-view inconsistencies using a patch-level bilateral grid, which may lead to visible block artifacts. Although a total variation (TV) loss is applied to alleviate this issue, the results are shown only on

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.