SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Bohan Yu; Wei Huang; Kang Liu

arXiv:2511.06446·cs.CL·November 11, 2025

SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Bohan Yu, Wei Huang, Kang Liu

PDF

Open Access 1 Datasets 1 Video

TL;DR

SR-KI introduces a scalable, end-to-end method for integrating large structured knowledge bases into large language models, enabling real-time retrieval and dynamic updates with high efficiency and accuracy.

Contribution

It presents a novel approach that encodes KBs into key-value pairs and supervises attention within LLMs, supporting end-to-end retrieval without external retrievers.

Findings

01

Supports integration of up to 40K KBs into a 7B LLM.

02

Achieves over 98% Recall@10 in retrieval tasks.

03

Maintains strong task performance with 99.75% KB compression.

Abstract

This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the models latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

SharkSpicy/wikidata
dataset· 12 dl
12 dl

Videos

SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications