Improving Handwritten Text Recognition via 3D Attention and Multi-Scale Training

Zi-Rui Wang

arXiv:2410.18374·cs.AI·August 5, 2025

Improving Handwritten Text Recognition via 3D Attention and Multi-Scale Training

Zi-Rui Wang

PDF

Open Access

TL;DR

This paper introduces a novel 3D attention-based neural network with multi-scale training for improved handwritten text recognition, achieving competitive results on Chinese and English datasets.

Contribution

The paper proposes a new recognition network utilizing 3D attention modules and multi-scale training, inspired by existing models, to enhance handwritten text recognition accuracy.

Findings

01

Achieves comparable results with state-of-the-art methods on Chinese and English datasets.

02

Introduces a 3D attention module for sequential visual feature extraction.

03

Demonstrates effectiveness of multi-scale training in handwritten text recognition.

Abstract

The segmentation-free research efforts for addressing handwritten text recognition can be divided into three categories: connectionist temporal classification (CTC), hidden Markov model and encoder-decoder methods. In this paper, inspired by the above three modeling methods, we propose a new recognition network by using a novel three-dimensional (3D) attention module and global-local context information. Based on the feature maps of the last convolutional layer, a series of 3D blocks with different resolutions are split. Then, these 3D blocks are fed into the 3D attention module to generate sequential visual features. Finally, by fusing the visual features and the corresponding global-local context features, a well-designed representation can be obtained. Main canonical neural units including attention mechanisms, fully-connected layers, recurrent units and convolutional layers are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction

MethodsSoftmax · Attention Is All You Need · Connectionist Temporal Classification Loss