Improving Diffusion Models for Scene Text Editing with Dual Encoders

Jiabao Ji; Guanhua Zhang; Zhaowen Wang; Bairu Hou; Zhifei Zhang; Brian; Price; Shiyu Chang

arXiv:2304.05568·cs.CV·April 13, 2023·6 cites

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian, Price, Shiyu Chang

PDF

Open Access 2 Repos

TL;DR

This paper introduces DIFFSTE, a dual encoder diffusion model that significantly improves scene text editing by enhancing text accuracy and style control, with strong zero-shot generalization capabilities demonstrated across multiple datasets.

Contribution

The paper proposes a novel dual encoder diffusion framework with instruction tuning, enabling better text rendering, style control, and zero-shot generalization in scene text editing.

Findings

01

Outperforms previous methods in text correctness and naturalness

02

Achieves effective style control and zero-shot font variation generation

03

Demonstrates superior results on five benchmark datasets

Abstract

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsDiffusion