Towards Training-Free Scene Text Editing

Yubo Li; Xugong Qin; Peng Zhang; Hailun Lin; Gangyan Zeng; Kexin Zhang

arXiv:2603.24571·cs.CV·March 26, 2026

Towards Training-Free Scene Text Editing

Yubo Li, Xugong Qin, Peng Zhang, Hailun Lin, Gangyan Zeng, Kexin Zhang

PDF

Open Access

TL;DR

TextFlow introduces a training-free scene text editing framework that combines attention guidance and flow modeling to enable high-quality, flexible text modifications in images without additional training, outperforming existing methods.

Contribution

It presents a novel training-free approach integrating Attention Boost and Flow Manifold Steering for effective scene text editing without task-specific training.

Findings

01

Achieves comparable or superior visual quality and text accuracy to training-based methods.

02

Generalizes well across diverse scenes and languages.

03

Operates in an end-to-end, plug-and-play manner.

Abstract

Scene text editing seeks to modify textual content in natural images while maintaining visual realism and semantic consistency. Existing methods often require task-specific training or paired data, limiting their scalability and adaptability. In this paper, we propose TextFlow, a training-free scene text editing framework that integrates the strengths of Attention Boost (AttnBoost) and Flow Manifold Steering (FMS) to enable flexible, high-fidelity text manipulation without additional training. Specifically, FMS preserves the structural and style consistency by modeling the visual flow of characters and background regions, while AttnBoost enhances the rendering of textual content through attention-based guidance. By jointly leveraging these complementary modules, our approach performs end-to-end text editing through semantic alignment and spatial refinement in a plug-and-play manner.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications