Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error
Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

TL;DR
This study investigates how different types and rates of errors affect reading difficulty, showing that human comprehension remains robust despite errors, and introduces a character-based surprisal model to explain these effects.
Contribution
The paper introduces a character-based surprisal model that accounts for reading difficulty caused by errors, highlighting the impact of error type and rate on comprehension.
Findings
Transpositions cause more reading difficulty than misspellings.
Higher error rates increase reading difficulty for all words.
Human comprehension remains largely unaffected despite errors.
Abstract
Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Reading and Literacy Development · Topic Modeling
