S2Doc -- Spatial-Semantic Document Format

Sebastian Kempf; Frank Puppe

arXiv:2511.01113·cs.DL·November 4, 2025

S2Doc -- Spatial-Semantic Document Format

Sebastian Kempf, Frank Puppe

PDF

Open Access

TL;DR

S2Doc is a novel, flexible data structure that unifies spatial and semantic information for modeling documents and tables, supporting multi-page documents and various approaches.

Contribution

It introduces the first comprehensive format combining spatial and semantic aspects for documents and tables, enhancing standardization and extendability.

Findings

01

Supports most modeling approaches for documents and tables

02

Enables multi-page document representation

03

Facilitates interoperability between different data structures

Abstract

Documents are a common way to store and share information, with tables being an important part of many documents. However, there is no real common understanding of how to model documents and tables in particular. Because of this lack of standardization, most scientific approaches have their own way of modeling documents and tables, leading to a variety of different data structures and formats that are not directly compatible. Furthermore, most data models focus on either the spatial or the semantic structure of a document, neglecting the other aspect. To address this, we developed S2Doc, a flexible data structure for modeling documents and tables that combines both spatial and semantic information in a single format. It is designed to be easily extendable to new tasks and supports most modeling approaches for documents and tables, including multi-page documents. To the best of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Data Visualization and Analytics · Advanced Database Systems and Queries