From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline
Sarah Binta Alam Shoilee, Victor de Boer, Jacco van Ossenbruggen, Susan Leg\^ene

TL;DR
This paper introduces a modular, provenance-aware pipeline that converts handwritten archival tables into Knowledge Graphs, enhancing transparency, traceability, and human oversight in complex historical data processing.
Contribution
The work presents a novel, modular, provenance-integrated approach for transforming handwritten tables into Knowledge Graphs, supporting human-AI collaboration and transparency.
Findings
Modular pipeline improves flexibility in table reconstruction.
Provenance integration ensures traceability of extracted data.
Experiments on archival military data validate the approach.
Abstract
Handwritten archival tables contain rich historical information, yet transforming them into structured representations, such as Knowledge Graphs, requires integrating table structure recognition, handwriting recognition, and semantic interpretation - a complex multimodal process. End-to-end AI implementations can obscure these steps, resulting in opaque algorithmic operations that hinder human oversight, critical assessment, and trust. To address this, we present a modular, provenance-aware pipeline to convert handwritten tabular images into KGs supporting human-AI collaboration. The pipeline decomposes the workflow into three stages - table reconstruction, information extraction, and KG construction - while exposing intermediate representations for inspection, evaluation, and correction. A key contribution of our approach is the systematic integration of data provenance at every stage,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
