MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing

Liwei Cheng; Shibo Feng; Lunjie Zhou; Yixuan Guan; Dayan Guan

arXiv:2605.08163·cs.CV·May 19, 2026

MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing

Liwei Cheng, Shibo Feng, Lunjie Zhou, Yixuan Guan, Dayan Guan

PDF

TL;DR

This paper introduces MULTITEXTEDIT, a comprehensive benchmark for evaluating cross-lingual text-in-image editing, highlighting significant language-specific challenges and proposing a novel language fidelity metric.

Contribution

It presents a new multilingual benchmark with a specialized language fidelity metric to assess cross-lingual performance in text-in-image editing systems.

Findings

01

Pronounced cross-lingual degradation observed across models.

02

Largest errors in Hebrew and Arabic, smallest in Dutch and Spanish.

03

Outputs often preserve layout but distort script-specific text.

Abstract

Text-in-image editing has become a key capability for visual content creation, yet existing benchmarks remain overwhelmingly English-centric and often conflate visual plausibility with semantic correctness. We introduce MULTITEXTEDIT, a controlled benchmark of 3,600 instances spanning 12 typologically diverse languages, 5 visual domains, and 7 editing operations. Language variants of each instance share a common visual base and are paired with a human-edited reference and region masks, isolating the language variable for cross-lingual comparison. To capture script-level errors that coarse text-matching metrics miss, such as missing diacritics, reversed RTL order, and mixed-script renderings, we introduce a language fidelity (LSF) metric scored by a two-stage LVM protocol that first traces the edited target text and then judges it in isolation, reaching a quadratic-weighted \k{appa} of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.