TL;DR
The paper introduces VLM-CC, an iterative, feedback-guided framework for cross-camera color constancy that leverages vision-language models to improve robustness without direct RGB regression.
Contribution
It reframes color constancy as an iterative perceptual feedback problem using VLM evaluation, achieving state-of-the-art cross-camera robustness.
Findings
VLM-CC outperforms existing methods on multiple datasets.
The framework effectively generalizes across different camera types.
Iterative feedback improves color correction accuracy.
Abstract
Color constancy aims to keep object colors consistent under varying illumination. Cross-camera generalization in color constancy remains challenging because learning-based models often overfit to the color response characteristics of the training camera, resulting in degraded performance on images captured by other cameras. We propose VLM-CC, a feedback-guided framework that formulates color constancy as an iterative refinement process. Instead of directly estimating the illuminant from raw input, VLM-CC performs iterative correction driven by vision-language model (VLM)-based evaluation. At each iteration, the image is white-balanced using the current estimate and converted to pseudo-sRGB. A lightweight LoRA-tuned VLM then assesses the corrected image, identifying the dominant residual color cast and providing qualitative feedback. This feedback is mapped to a residual illumination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
