Unified Line and Paragraph Detection by Graph Convolutional Networks
Shuang Liu, Renshen Wang, Michalis Raptis, Yasuhisa Fujii

TL;DR
This paper introduces a unified approach using graph convolutional networks to detect lines and paragraphs in documents, effectively modeling layout as a two-level clustering problem for improved accuracy and efficiency.
Contribution
The paper presents a novel unified method employing graph convolutional networks to simultaneously detect lines and paragraphs as a hierarchical clustering problem.
Findings
Achieves state-of-the-art paragraph detection accuracy
Demonstrates high efficiency in processing document layouts
Effective in both benchmarks and real-world images
Abstract
We formulate the task of detecting lines and paragraphs in a document into a unified two-level clustering problem. Given a set of text detection boxes that roughly correspond to words, a text line is a cluster of boxes and a paragraph is a cluster of lines. These clusters form a two-level tree that represents a major part of the layout of a document. We use a graph convolutional network to predict the relations between text detection boxes and then build both levels of clusters from these predictions. Experimentally, we demonstrate that the unified approach can be highly efficient while still achieving state-of-the-art quality for detecting paragraphs in public benchmarks and real-world images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
