Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables
Jingxuan Qi, Zhiqiang Ye, and Yuxiang Feng

TL;DR
This paper investigates how format-constraint interactions in statistical tables affect knowledge graph extraction fidelity, revealing significant coupling effects and proposing a benchmark for fidelity-aware evaluation.
Contribution
It introduces the concept of format-constraint coupling in knowledge graph construction and provides CSVFidelity-Bench for fidelity-aware evaluation.
Findings
Format-constraint coupling exceeds additive effects by up to +1.180 in datasets.
Schema applied to mismatched formats can cause catastrophic mismatch, reducing fact coverage.
Direct graph access reveals gaps up to +47.6pp, unlike standard retrieval modes.
Abstract
An extraction schema should not reduce knowledge graph fidelity. On statistical CSV, however, it can. We study country-by-year time-series matrices, a common layout on open-data portals. In this setting, serialization format and schema constraints interact super-additively. Their joint effect exceeds the sum of independent effects by up to +1.180 (2x2 factorial, 6 datasets). Bootstrap 95% CIs are strictly positive on 4/6 datasets, with strongest evidence on wide Type-II matrices. More critically, a schema applied to a mismatched format can trigger catastrophic mismatch. Fact coverage falls below the unconstrained baseline on 4/6 datasets through entity inflation or extraction refusal. We call this observed pattern format-constraint coupling. Probing and token ablation support a surface-form anchoring explanation centred on column-name references. Controlled variants across format-schema…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
