Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset
Yuhai Deng, Huimin She, Wei Shen, Meng Li, Ruoxi Wu, Lunxi Yuan, and Xiang Li

TL;DR
This paper introduces a large-scale triplet dataset for tone style transfer and proposes a diffusion-based in-context framework that improves stylistic fidelity and visual quality in photo retouching.
Contribution
The creation of TST100K dataset and the development of ICTone, a diffusion-based in-context style transfer method that leverages semantic priors and reward feedback.
Findings
TST100K dataset enhances model training for tone style transfer.
ICTone outperforms existing methods in quantitative and human evaluations.
Joint conditioning on images improves semantic and aesthetic quality.
Abstract
Tone style transfer for photo retouching aims to adapt the stylistic tone of the reference image to a given content image. However, the lack of high-quality large-scale triplet datasets with stylized ground truth forces existing methods to rely on self-supervised or proxy objectives, which limits model capability. To mitigate this gap, we design a data construction pipeline to build TST100K, a large-scale dataset of 100,000 content-reference-stylized triplets. At the core of this pipeline, we train a tone style scorer to ensure strict stylistic consistency for each triplet. In addition, existing methods typically extract content and reference features independently and then fuse them in a decoder, which may cause semantic loss and lead to inappropriate color transfer and degraded visual aesthetics. Instead, we propose ICTone, a diffusion-based framework that performs tone transfer in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
