A Tight Expressivity Hierarchy for GNN-Based Entity Resolution in Master Data Management

Ashwin Ganesan

arXiv:2603.27154·cs.LG·March 31, 2026

A Tight Expressivity Hierarchy for GNN-Based Entity Resolution in Master Data Management

Ashwin Ganesan

PDF

TL;DR

This paper establishes a theoretical hierarchy for the complexity of GNN architectures in entity resolution, identifying minimal neural network structures needed for different resolution tasks.

Contribution

It introduces a formal separation theory with tight bounds for GNN capabilities in entity resolution, guiding practitioners on minimal architecture requirements.

Findings

01

Detecting shared attributes is a local problem requiring 2-layer reverse message passing.

02

Detecting multiple shared attributes or cycles requires 4-layer ego ID mechanisms.

03

The results provide a minimal-architecture principle for efficient GNN design.

Abstract

Entity resolution -- identifying database records that refer to the same real-world entity -- is naturally modelled on bipartite graphs connecting entity nodes to their attribute values. Applying a message-passing neural network (MPNN) with all available extensions (reverse message passing, port numbering, ego IDs) incurs unnecessary overhead, since different entity resolution tasks have fundamentally different complexity. For a given matching criterion, what is the cheapest MPNN architecture that provably works? We answer this with a four-theorem separation theory on typed entity-attribute graphs. We introduce co-reference predicates $Dup_{r}$ (two same-type entities share at least $r$ attribute values) and the $ℓ$ -cycle predicate $Cyc_{ℓ}$ for settings with entity-entity edges. For each predicate we prove tight bounds -- constructing graph pairs provably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.