AceTone: Bridging Words and Colors for Conditional Image Grading

Tianren Ma; Mingxiang Liao; Xijin Zhang; Qixiang Ye

arXiv:2604.00530·cs.CV·April 2, 2026

AceTone: Bridging Words and Colors for Conditional Image Grading

Tianren Ma, Mingxiang Liao, Xijin Zhang, Qixiang Ye

PDF

1 Repo

TL;DR

AceTone introduces a unified, multimodal framework for color grading that uses generative models conditioned on text or images, achieving state-of-the-art results and aesthetic alignment.

Contribution

It is the first approach to support multimodal conditioned color grading within a single framework, utilizing a VQ-VAE tokenizer and reinforcement learning for perceptual quality.

Findings

01

Achieves up to 50% improvement in LPIPS over existing methods.

02

State-of-the-art performance on text-guided and reference-guided grading tasks.

03

Human evaluations confirm visually pleasing and stylistically coherent results.

Abstract

Color affects how we interpret image style and emotion. Previous color grading methods rely on patch-wise recoloring or fixed filter banks, struggling to generalize across creative intents or align with human aesthetic preferences. In this study, we propose AceTone, the first approach that supports multimodal conditioned color grading within a unified framework. AceTone formulates grading as a generative color transformation task, where a model directly produces 3D-LUTs conditioned on text prompts or reference images. We develop a VQ-VAE based tokenizer which compresses a $3 \times 3 2^{3}$ LUT vector to 64 discrete tokens with $Δ E < 2$ fidelity. We further build a large-scale dataset, AceTone-800K, and train a vision-language model to predict LUT tokens, followed by reinforcement learning to align outputs with perceptual fidelity and aesthetics. Experiments show that AceTone achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martian422/AceTone
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.