# Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR

**Authors:** Shashank Vempati, Nishit Anand, Gaurav Talebailkar, Arpan Garai, Chetan Arora

arXiv: 2508.21693 · 2025-09-01

## TL;DR

This paper introduces line-level OCR to improve accuracy and efficiency by leveraging larger context and reducing segmentation errors, supported by a new dataset and experimental validation.

## Contribution

It proposes a novel line-level OCR approach that surpasses word-level methods in accuracy and efficiency, and provides a new dataset for benchmarking.

## Key findings

- 5.4% accuracy improvement over word-level OCR
- 4 times faster processing than word-based pipelines
- Enhanced utilization of language models with line context

## Abstract

Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to sequence translation in last decade led to modern techniques first detecting words and then inputting one word at a time to a model to directly output full words as sequence of characters. This allowed better utilization of language models and bypass error-prone character segmentation step. We observe that the above transition in style has moved the bottleneck in accuracy to word segmentation. Hence, in this paper, we propose a natural and logical progression from word level OCR to line-level OCR. The proposal allows to bypass errors in word detection, and provides larger sentence context for better utilization of language models. We show that the proposed technique not only improves the accuracy but also efficiency of OCR. Despite our thorough literature survey, we did not find any public dataset to train and benchmark such shift from word to line-level OCR. Hence, we also contribute a meticulously curated dataset of 251 English page images with line-level annotations. Our experimentation revealed a notable end-to-end accuracy improvement of 5.4%, underscoring the potential benefits of transitioning towards line-level OCR, especially for document images. We also report a 4 times improvement in efficiency compared to word-based pipelines. With continuous improvements in large language models, our methodology also holds potential to exploit such advances. Project Website: https://nishitanand.github.io/line-level-ocr-website

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21693/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21693/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/2508.21693/full.md

---
Source: https://tomesphere.com/paper/2508.21693