EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition
Xiao Wang, Jingtao Jiang, Dong Li, Futian Wang, Lin Zhu, Yaowei Wang,, Yongyong Tian, Jin Tang

TL;DR
This paper introduces EventSTR, a large-scale benchmark dataset for event-based scene text recognition, along with a novel framework that leverages event camera data and large language models to improve recognition accuracy under challenging conditions.
Contribution
It provides the first large-scale event-based scene text recognition dataset and proposes a new framework integrating event features with language models for improved accuracy.
Findings
EventSTR dataset contains 9,928 high-definition event samples with Chinese and English characters.
The proposed SimC-ESTR framework outperforms existing methods on EventSTR and simulation datasets.
The approach effectively handles challenging conditions like low illumination and motion blur in scene text recognition.
Abstract
Mainstream Scene Text Recognition (STR) algorithms are developed based on RGB cameras which are sensitive to challenging factors such as low illumination, motion blur, and cluttered backgrounds. In this paper, we propose to recognize the scene text using bio-inspired event cameras by collecting and annotating a large-scale benchmark dataset, termed EventSTR. It contains 9,928 high-definition (1280 * 720) event samples and involves both Chinese and English characters. We also benchmark multiple STR algorithms as the baselines for future works to compare. In addition, we propose a new event-based scene text recognition framework, termed SimC-ESTR. It first extracts the event features using a visual encoder and projects them into tokens using a Q-former module. More importantly, we propose to augment the vision tokens based on a memory mechanism before feeding into the large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
