Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

Jingxuan Qi; Zhiqiang Ye; and Yuxiang Feng

arXiv:2605.21974·cs.AI·May 22, 2026

Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

Jingxuan Qi, Zhiqiang Ye, and Yuxiang Feng

PDF

TL;DR

This paper investigates how format-constraint interactions in statistical tables affect knowledge graph extraction fidelity, revealing significant coupling effects and proposing a benchmark for fidelity-aware evaluation.

Contribution

It introduces the concept of format-constraint coupling in knowledge graph construction and provides CSVFidelity-Bench for fidelity-aware evaluation.

Findings

01

Format-constraint coupling exceeds additive effects by up to +1.180 in datasets.

02

Schema applied to mismatched formats can cause catastrophic mismatch, reducing fact coverage.

03

Direct graph access reveals gaps up to +47.6pp, unlike standard retrieval modes.

Abstract

An extraction schema should not reduce knowledge graph fidelity. On statistical CSV, however, it can. We study country-by-year time-series matrices, a common layout on open-data portals. In this setting, serialization format and schema constraints interact super-additively. Their joint effect exceeds the sum of independent effects by up to +1.180 (2x2 factorial, 6 datasets). Bootstrap 95% CIs are strictly positive on 4/6 datasets, with strongest evidence on wide Type-II matrices. More critically, a schema applied to a mismatched format can trigger catastrophic mismatch. Fact coverage falls below the unconstrained baseline on 4/6 datasets through entity inflation or extraction refusal. We call this observed pattern format-constraint coupling. Probing and token ablation support a surface-form anchoring explanation centred on column-name references. Controlled variants across format-schema…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.