TextSculptor: Training and Benchmarking Scene Text Editing

Yiheng Lin; Siyu Jiao; Xiaohan Lan; Wei Zhou; Qi She; Fei Yu; Heyun Chen; Zhengwei Wang; Jinghuan Chen; Moran Li; Yingchen Yu; Zijian Feng; Yao Zhao; Yunchao Wei; Yujie Zhong

arXiv:2605.21090·cs.CV·May 21, 2026

TextSculptor: Training and Benchmarking Scene Text Editing

Yiheng Lin, Siyu Jiao, Xiaohan Lan, Wei Zhou, Qi She, Fei Yu, Heyun Chen, Zhengwei Wang, Jinghuan Chen, Moran Li, Yingchen Yu, Zijian Feng, Yao Zhao, Yunchao Wei, Yujie Zhong

PDF

1 Repo 1 Datasets

TL;DR

TextSculptor introduces a large-scale dataset and benchmark for scene text editing, enabling improved training and evaluation of models in this challenging task.

Contribution

It provides a comprehensive data construction pipeline, a large dataset, and a standardized benchmark specifically for scene text editing tasks.

Findings

01

TextSculptor improves open-source text editing performance.

02

The dataset contains 3.2 million samples, including OCR-verified and paired text editing data.

03

The benchmark covers four fundamental text editing tasks with a tailored evaluation protocol.

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) and diffusion-based generative models have substantially improved prompt-driven image editing. However, scene text editing remains challenging, as it requires models to precisely modify textual content while preserving visual realism and non-target regions. Current open-source models still lag behind proprietary systems, largely due to the scarcity of high-quality training data and the lack of standardized benchmarks tailored to text editing. To address these challenges, we present TextSculptor, a comprehensive framework for data construction and evaluation of scene text editing. We first develop an automated data construction pipeline that combines text-aware image synthesis with programmatic text rendering and compositing. Based on this pipeline, we build TextSculpt-Data, a large-scale dataset containing 3.2M training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linyiheng123/TextSculptor
github

Datasets

dafbgd/TextSculpt-Data
dataset· 1.9k dl
1.9k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.