Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text   Recognition

Chenyu Liu; Jinshui Hu; Baocai Yin; Jia Pan; Bing Yin; Jun Du,; Qingfeng Liu

arXiv:2502.06100·cs.CV·February 11, 2025

Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition

Chenyu Liu, Jinshui Hu, Baocai Yin, Jia Pan, Bing Yin, Jun Du,, Qingfeng Liu

PDF

Open Access

TL;DR

Col-OLHTR introduces a collaborative multimodal framework for online handwritten text recognition that achieves state-of-the-art results by learning global features during training and simplifying inference.

Contribution

The paper proposes a novel Col-OLHTR framework that combines multimodal feature learning with a single-stream inference process for OLHTR.

Findings

01

Achieves SOTA performance on OLHTR benchmarks.

02

Effectively captures global features with P2SA module.

03

Reduces inference complexity compared to multi-stream models.

Abstract

Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders, combined with a CTC or attention-based recognition decoder. However, these approaches face several drawbacks: 1) single encoders typically focus on either local trajectories or visual regions, lacking the ability to dynamically capture relevant global features in challenging cases; 2) multi-stream encoders, while more comprehensive, suffer from complex structures and increased inference costs. To tackle this, we propose a Collaborative learning-based OLHTR framework, called Col-OLHTR, that learns multimodal features during training while maintaining a single-stream inference process. Col-OLHTR consists of a trajectory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction

MethodsSoftmax · Attention Is All You Need · Focus