Linear Image Generation by Synthesizing Exposure Brackets
Yuekun Dai, Zhoutong Zhang, Shangchen Zhou, Nanxuan Zhao

TL;DR
This paper introduces a method for generating high-quality, scene-referred linear images from text prompts by synthesizing exposure brackets, enabling richer post-processing and editing options.
Contribution
It proposes a novel DiT-based flow-matching architecture for text-conditioned exposure bracket generation to produce linear images with full dynamic range.
Findings
Successfully synthesizes linear images with preserved highlights and shadows.
Enables text-guided linear image editing and structure-conditioned generation.
Addresses challenges of existing VAEs in handling high dynamic range images.
Abstract
The life of a photo begins with photons striking the sensor, whose signals are passed through a sophisticated image signal processing (ISP) pipeline to produce a display-referred image. However, such images are no longer faithful to the incident light, being compressed in dynamic range and stylized by subjective preferences. In contrast, RAW images record direct sensor signals before non-linear tone mapping. After camera response curve correction and demosaicing, they can be converted into linear images, which are scene-referred representations that directly reflect true irradiance and are invariant to sensor-specific factors. Since image sensors have better dynamic range and bit depth, linear images contain richer information than display-referred ones, leaving users more room for editing during post-processing. Despite this advantage, current generative models mainly synthesize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
