Training-Free Occluded Text Rendering via Glyph Priors and Attention-Guided Semantic Blending
Jingqi Hou, Hongtian Wang

TL;DR
This paper introduces a training-free method for rendering occluded text in images, utilizing a dual-stream inference framework and glyph priors to improve text readability and occlusion alignment without fine-tuning.
Contribution
The authors propose a novel training-free occluded text rendering framework that decouples text layout from occluder insertion using dual-stream inference and glyph priors.
Findings
Significantly improves text readability in occluded scenarios.
Achieves more stable object-on-text compositions.
Does not require model fine-tuning.
Abstract
We present a training-free framework for occluded text rendering with a pretrained FLUX.1-dev backbone. The task requires a model to render recognizable typography and place an occluding object over the intended text region. This setting remains difficult for existing text-to-image generators: the occluder often drifts away from the text, while the text may be distorted or appear to float on top of the occluding object. To address this problem, we propose a restarted dual-stream inference framework that decouples text-layout preservation from occluder insertion. A Base Stream provides a clean typographic reference and same-step key/value (K/V) features, while the Edit Stream is conditioned on the occlusion prompt. We further adopt the spectral glyph-prior idea from FreeText and adapt it to stabilize the target text structure during early-to-mid denoising. In the reasoning pass, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
