BusiNet -- a Light and Fast Text Detection Network for Business Documents
Oshri Naparstek, Ophir Azulai, Daniel Rotman, Yevgeny Burshtein, Peter, Staar, Udi Barzelay

TL;DR
BusiNet is a lightweight, fast, and robust text detection network designed for local OCR of business documents, effectively handling noise and damage while preserving privacy.
Contribution
The paper introduces BusiNet, a novel, efficient text detection network optimized for local processing of business documents with noise robustness.
Findings
BusiNet achieves high accuracy on public datasets.
It runs efficiently on local devices, ensuring privacy.
The model effectively handles noisy and damaged documents.
Abstract
For digitizing or indexing physical documents, Optical Character Recognition (OCR), the process of extracting textual information from scanned documents, is a vital technology. When a document is visually damaged or contains non-textual elements, existing technologies can yield poor results, as erroneous detection results can greatly affect the quality of OCR. In this paper we present a detection network dubbed BusiNet aimed at OCR of business documents. Business documents often include sensitive information and as such they cannot be uploaded to a cloud service for OCR. BusiNet was designed to be fast and light so it could run locally preventing privacy issues. Furthermore, BusiNet is built to handle scanned document corruption and noise using a specialized synthetic dataset. The model is made robust to unseen noise by employing adversarial training strategies. We perform an evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Anomaly Detection Techniques and Applications · Handwritten Text Recognition Techniques
Methodstravel james
