CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry
David Tschirschwitz, Volker Rodehorst

TL;DR
The paper introduces CISOL, a large, open, and domain-specific dataset of civil engineering documents for table structure recognition, demonstrating its effectiveness through benchmarking with YOLOv8.
Contribution
CISOL is a novel, publicly available dataset focusing on civil engineering documents, enhancing reproducibility and benchmarking in table recognition tasks.
Findings
CISOL contains over 120,000 annotated instances from 800+ images.
YOLOv8 achieves 67.22 mAP on CISOL, outperforming TSR-specific models.
The dataset improves research reproducibility and domain-specific table recognition.
Abstract
Reproducibility and replicability are critical pillars of empirical research, particularly in machine learning, where they depend not only on the availability of models, but also on the datasets used to train and evaluate those models. In this paper, we introduce the Construction Industry Steel Ordering List (CISOL) dataset, which was developed with a focus on transparency to ensure reproducibility, replicability, and extensibility. CISOL provides a valuable new research resource and highlights the importance of having diverse datasets, even in niche application domains such as table extraction in civil engineering. CISOL is unique in that it contains real-world civil engineering documents from industry, making it a distinctive contribution to the field. The dataset contains more than 120,000 annotated instances in over 800 document images, positioning it as a medium-sized dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
MethodsFocus · You Only Look Once
