Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models

Haoyu Wang; Peihao Wang; Mufei Li; Shikun Liu; Siqi Miao; Zhangyang Wang; Pan Li

arXiv:2506.07334·cs.LG·January 13, 2026

Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models

Haoyu Wang, Peihao Wang, Mufei Li, Shikun Liu, Siqi Miao, Zhangyang Wang, Pan Li

PDF

Open Access 1 Datasets

TL;DR

Graph-KV introduces a method to incorporate structural biases into large language models by using a graph-structured attention mechanism, improving performance on tasks involving complex data structures like graphs and long documents.

Contribution

The paper proposes Graph-KV, a novel approach that injects structural inductive biases into LLMs using graph-structured attention, enabling better handling of non-sequential data dependencies.

Findings

01

Outperforms baseline models on seven RAG benchmarks.

02

Effectively reduces positional bias in large language models.

03

Enhances reasoning and understanding in graph-structured tasks.

Abstract

Modern large language models (LLMs) are inherently auto-regressive, requiring input to be serialized into flat sequences regardless of their structural dependencies. This serialization hinders the model's ability to leverage structural inductive biases, especially in tasks such as retrieval-augmented generation (RAG) and reasoning on data with native graph structures, where inter-segment dependencies are crucial. We introduce Graph-KV with the potential to overcome this limitation. Graph-KV leverages the KV-cache of text segments as condensed representations and governs their interaction through structural inductive biases. In this framework, 'target' segments selectively attend only to the KV-caches of their designated 'source' segments, rather than all preceding segments in a serialized sequence. This approach induces a graph-structured block mask, sparsifying attention and enabling a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Graph-COM/GraphKV
dataset· 210 dl
210 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Computational and Text Analysis Methods