Optical character recognition quality affects perceived usefulness of historical newspaper clippings
Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula, P\"a\"akk\"onen, Juha Rautiainen

TL;DR
This study demonstrates that higher optical character recognition quality enhances users' perceived usefulness and relevance assessments of historical newspaper articles in an interactive information retrieval setting.
Contribution
It provides the first empirical evidence that OCR quality directly influences subjective relevance evaluations in historical document retrieval.
Findings
Higher OCR quality leads to increased relevance scores.
Improved OCR results in more accurate document retrieval.
User evaluations are significantly affected by OCR quality differences.
Abstract
Introduction. We study effect of different quality optical character recognition in interactive information retrieval with a collection of one digitized historical Finnish newspaper. Method. This study is based on the simulated interactive information retrieval work task model. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869-1918 with ca. 1.45 million auto segmented articles. Our article search database had two versions of each article with different quality optical character recognition. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top-10 results using graded relevance scale of 0-3 without knowing about the optical character recognition quality differences of the otherwise identical articles. Analysis. Analysis of the user evaluations was performed by comparing mean averages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAugmented Reality Applications · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques
