What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA

Jonas Petersen; Camilla Mazzoleni; Gian-Alessandro Lombardi; Federico Martelli; Riccardo Maggioni

arXiv:2602.02834·cs.LG·May 12, 2026

What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA

Jonas Petersen, Camilla Mazzoleni, Gian-Alessandro Lombardi, Federico Martelli, Riccardo Maggioni

PDF

TL;DR

This study identifies that sparse adjacency masking is the key structural inductive bias enabling transformers to effectively perform multi-hop reasoning over knowledge graphs, surpassing relation-specific parameters.

Contribution

The paper demonstrates through ablations that topological signals from adjacency masking are the primary driver of reasoning performance, with relation biases offering limited additional benefit.

Findings

01

Sparse adjacency masking accounts for most performance gains.

02

Relation-specific biases provide modest improvements and can be detrimental without structural guidance.

03

Masking-based attention is more robust than relation-specific weights in zero-shot scenarios.

Abstract

What structural inductive bias helps transformers reason over knowledge graphs? Through controlled ablations of a minimal transformer modification with four independently removable components (sparse adjacency masking, edge-type biases, query scaling, value gating), we isolate which structural signals drive multi-hop reasoning. Our finding is sharp: sparse adjacency masking alone accounts for the dominant share of improvement over unmasked transformers (+72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, +53.9pp on CWQ), while learned relation parameters add only modest refinement and can actively hurt without structural guidance. A zero-shot experiment provides architecturally independent corroboration: masking-based attention degrades 4.0x less than relation-specific weights when edge types are held out. The useful inductive bias for multi-hop KGQA is predominantly topological, not relational.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.