STEP -- Towards Structured Scene-Text Spotting
Sergi Garcia-Bordils, Dimosthenis Karatzas, Mar\c{c}al Rusi\~nol

TL;DR
This paper introduces the structured scene-text spotting task and proposes STEP, a model that dynamically detects and recognizes text based on user-defined regular expressions, enabling accurate zero-shot OCR in complex real-world scenarios.
Contribution
The paper presents a novel structured scene-text spotting task and a model, STEP, that conditions OCR on regular expressions, handling spaces and various granularities, trained solely on public data.
Findings
STEP achieves accurate zero-shot text spotting in diverse scenarios.
The new dataset includes out-of-vocabulary structured text like prices and serial numbers.
STEP outperforms baseline methods in structured text recognition tasks.
Abstract
We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
STEP – Towards Structured Scene-Text Spotting· youtube
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
