EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene   Text Recognition

Xiao Wang; Jingtao Jiang; Dong Li; Futian Wang; Lin Zhu; Yaowei Wang,; Yongyong Tian; Jin Tang

arXiv:2502.09020·cs.CV·February 14, 2025

EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition

Xiao Wang, Jingtao Jiang, Dong Li, Futian Wang, Lin Zhu, Yaowei Wang,, Yongyong Tian, Jin Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces EventSTR, a large-scale benchmark dataset for event-based scene text recognition, along with a novel framework that leverages event camera data and large language models to improve recognition accuracy under challenging conditions.

Contribution

It provides the first large-scale event-based scene text recognition dataset and proposes a new framework integrating event features with language models for improved accuracy.

Findings

01

EventSTR dataset contains 9,928 high-definition event samples with Chinese and English characters.

02

The proposed SimC-ESTR framework outperforms existing methods on EventSTR and simulation datasets.

03

The approach effectively handles challenging conditions like low illumination and motion blur in scene text recognition.

Abstract

Mainstream Scene Text Recognition (STR) algorithms are developed based on RGB cameras which are sensitive to challenging factors such as low illumination, motion blur, and cluttered backgrounds. In this paper, we propose to recognize the scene text using bio-inspired event cameras by collecting and annotating a large-scale benchmark dataset, termed EventSTR. It contains 9,928 high-definition (1280 * 720) event samples and involves both Chinese and English characters. We also benchmark multiple STR algorithms as the baselines for future works to compare. In addition, we propose a new event-based scene text recognition framework, termed SimC-ESTR. It first extracts the event features using a visual encoder and projects them into tokens using a Q-former module. More importantly, we propose to augment the vision tokens based on a memory mechanism before feeding into the large language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

event-ahu/eventstr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques