Single-Character-Based Embedding Feature Aggregation Using Cross-Attention for Scene Text Super-Resolution

Meng Wang; Qianqian Li; Haipeng Liu

PMC · DOI:10.3390/s25072228·April 2, 2025

Single-Character-Based Embedding Feature Aggregation Using Cross-Attention for Scene Text Super-Resolution

Meng Wang, Qianqian Li, Haipeng Liu

PDF

Open Access

TL;DR

This paper introduces a new method for improving the clarity of text in images by using cross-attention to handle overlapping characters and complex backgrounds.

Contribution

The novel contribution is a single-character-based embedding feature aggregation with cross-attention for scene text super-resolution.

Findings

01

The proposed method improves text recognition accuracy by 0.9–1.4% over the baseline TATT on the TextZoom benchmark.

02

The model achieves an optimal SSIM value of 0.7951 and a PSNR of 21.84.

03

The approach improves accuracy by 0.2–2.2% over existing baselines on five text recognition datasets.

Abstract

In textual vision scenarios, super-resolution aims to enhance textual quality and readability to facilitate downstream tasks. However, the ambiguity of character regions in complex backgrounds remains challenging to mitigate, particularly the interference between tightly connected characters. In this paper, we propose single-character-based embedding feature aggregation using cross-attention for scene text super-resolution (SCE-STISR) to solve this problem. Firstly, a dynamic feature extraction mechanism is employed to adaptively capture shallow features by dynamically adjusting multi-scale feature weights based on spatial representations. During text–image interactions, a dual-level cross-attention mechanism is introduced to comprehensively aggregate the cropped single-character features with textual prior, also aligning semantic sequences and visual features. Finally, an adaptive…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

F2R

Proteins1

Species1

Homo sapiens(human · species)

Chemicals1

CCB

Diseases6

STISR MSA injury to DIFE SCBD stroke

Figures11

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Digital Media Forensic Detection · Image Processing Techniques and Applications