Broken News: Making Newspapers Accessible to Print-Impaired
Vishal Agarwal, Tanuja Ganu, Saikat Guha

TL;DR
This paper introduces a method to digitize print newspapers into accessible HTML format for print-impaired users, utilizing advanced layout analysis, OCR, and a novel loss function to improve accuracy and reduce errors.
Contribution
It proposes a new EdgeMask loss function for Mask-RCNN that enhances segmentation accuracy, leading to significant reduction in OCR errors for accessible newspaper content.
Findings
Word Error Rate reduced by 32.5% with the new loss function
Improved segmentation accuracy enhances OCR performance
Accessible newspaper content generation for print-impaired users
Abstract
Accessing daily news content still remains a big challenge for people with print-impairment including blind and low-vision due to opacity of printed content and hindrance from online sources. In this paper, we present our approach for digitization of print newspaper into an accessible file format such as HTML. We use an ensemble of instance segmentation and detection framework for newspaper layout analysis and then OCR to recognize text elements such as headline and article text. Additionally, we propose EdgeMask loss function for Mask-RCNN framework to improve segmentation mask boundary and hence accuracy of downstream OCR task. Empirically, we show that our proposed loss function reduces the Word Error Rate (WER) of news article text by 32.5 %.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Accessibility for Disabilities · Web Data Mining and Analysis · Tactile and Sensory Interactions
