Improving Structured Text Recognition with Regular Expression Biasing
Baoguang Shi, Wenfeng Cheng, Yijuan Lu, Cha Zhang, Dinei Florencio

TL;DR
This paper introduces a method to improve structured text recognition accuracy by biasing the recognition process with regular expressions modeled as WFSTs, effectively recognizing text with known formats or domain-specific vocabulary.
Contribution
The paper presents a novel approach to incorporate regex-based biasing into text recognition systems using WFSTs, enhancing accuracy for structured text recognition tasks.
Findings
Significant accuracy improvement on structured text datasets
Small degradation on non-matching text recognition
Effective for both printed and handwritten text
Abstract
We study the problem of recognizing structured text, i.e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing. A biased recognizer recognizes text that matches the specified regexes with significantly improved accuracy, at the cost of a generally small degradation on other text. The biasing is realized by modeling regexes as a Weighted Finite-State Transducer (WFST) and injecting it into the decoder via dynamic replacement. A single hyperparameter controls the biasing strength. The method is useful for recognizing text lines with known formats or containing words from a domain vocabulary. Examples include driver license numbers, drug names in prescriptions, etc. We demonstrate the efficacy of regex biasing on datasets of printed and handwritten structured text and measures its side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Algorithms and Data Compression
