RawGen: Learning Camera Raw Image Generation
Dongyoung Kim, Junyong Lee, Abhijith Punnappurath, Mahmoud Afifi, Sangmin Han, Alex Levinshtein, Michael S. Brown

TL;DR
RawGen is a novel diffusion-based framework that synthesizes physically consistent camera raw images from text prompts and inverts sRGB images back to raw, addressing dataset scarcity and ISP diversity.
Contribution
It introduces the first diffusion model for text-to-raw generation and sRGB-to-raw inversion, leveraging large-scale sRGB models and a new inverse-ISP dataset.
Findings
RawGen outperforms traditional inverse-ISP methods.
It enables camera-specific raw image synthesis from text.
Augmenting training data with RawGen improves low-level vision tasks.
Abstract
Cameras capture scene-referred linear raw images, which are processed by onboard image signal processors (ISPs) into display-referred 8-bit sRGB outputs. Although raw data is more faithful for low-level vision tasks, collecting large-scale raw datasets remains a major bottleneck, as existing datasets are limited and tied to specific camera hardware. Generative models offer a promising way to address this scarcity -- however, existing diffusion frameworks are designed to synthesize photo-finished sRGB images rather than physically consistent linear representations. This paper presents RawGen, to our knowledge the first diffusion-based framework enabling text-to-raw generation for arbitrary target cameras, alongside sRGB-to-raw inversion. RawGen leverages the generative priors of large-scale sRGB diffusion models to synthesize physically meaningful linear outputs, such as CIE XYZ or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
