Detecting Layout Templates in Complex Multiregion Files

Gerardo Vitagliano; Lan Jiang; Felix Naumann

arXiv:2109.06630·cs.IR·June 22, 2022

Detecting Layout Templates in Complex Multiregion Files

Gerardo Vitagliano, Lan Jiang, Felix Naumann

PDF

1 Repo

TL;DR

This paper introduces Mondrian, an automated method for detecting layout templates in complex spreadsheets with multiple regions, improving the identification of recurring structures across files.

Contribution

The paper presents a novel three-phase approach combining image rendering, clustering, and graph comparison to identify layout templates in multiregion spreadsheets, outperforming existing algorithms.

Findings

01

Effective detection of region boundaries within files.

02

Successful identification of recurring layouts across multiple spreadsheets.

03

Outperforms state-of-the-art table recognition algorithms.

Abstract

Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hpi-information-systems/mondrian
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.