Semantic Parsing of Interpage Relations
Mehmet Arif Demirta\c{s}, Berke Oral, Mehmet Yasin Akp{\i}nar, Onur, Deniz

TL;DR
This paper introduces a novel end-to-end multimodal approach for semantic parsing of interpage relations in multi-page documents, significantly improving accuracy in segmentation, classification, and dependency extraction.
Contribution
It formalizes the task as semantic parsing of interpage relations and proposes a multi-task, multimodal method inspired by dependency parsing literature, pioneering this approach in document analysis.
Findings
41 percentage point increase in LAS for semantic parsing
33 percentage point increase in accuracy for page segmentation
45 percentage point increase in page classification
Abstract
Page-level analysis of documents has been a topic of interest in digitization efforts, and multimodal approaches have been applied to both classification and page stream segmentation. In this work, we focus on capturing finer semantic relations between pages of a multi-page document. To this end, we formalize the task as semantic parsing of interpage relations and we propose an end-to-end approach for interpage dependency extraction, inspired by the dependency parsing literature. We further design a multi-task training approach to jointly optimize for page embeddings to be used in segmentation, classification, and parsing of the page dependencies using textual and visual features extracted from the pages. Moreover, we also combine the features from two modalities to obtain multimodal page embeddings. To the best of our knowledge, this is the first study to extract rich semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Topic Modeling · Text and Document Classification Technologies
