TL;DR
PRISM is a diffusion-based framework for text image super-resolution that improves global prior rectification and local structure refinement, enhancing readability and accuracy in degraded images.
Contribution
It introduces Flow-Matching Prior Rectification and a Structure-guided Uncertainty-aware Residual Encoder for improved text super-resolution.
Findings
Achieves state-of-the-art performance on synthetic and real-world benchmarks.
Provides millisecond-level inference speed.
Effectively restores text structure and readability in severely degraded images.
Abstract
Text image super-resolution (Text-SR) requires more than visually plausible detail synthesis: slight errors in stroke topology may alter character identity and break readability. Existing methods improve text fidelity with stronger recognition-based or generative priors, yet they still face two unresolved challenges under severe degradation: the text condition extracted from low-quality inputs can itself be unreliable, and a plausible global prior does not fully determine fine-grained stroke boundaries. We present PRISM, a single-step diffusion-based Text-SR framework that addresses these two challenges through Flow-Matching Prior Rectification (FMPR) and a Structure-guided Uncertainty-aware Residual Encoder (SURE). FMPR constructs a privileged training-time prior from paired low-quality/high-quality latents and learns a flow matching that transports degraded embeddings toward this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
