SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention
Bohan Yu, Wei Huang, Kang Liu

TL;DR
SR-KI introduces a scalable, end-to-end method for integrating large structured knowledge bases into large language models, enabling real-time retrieval and dynamic updates with high efficiency and accuracy.
Contribution
It presents a novel approach that encodes KBs into key-value pairs and supervises attention within LLMs, supporting end-to-end retrieval without external retrievers.
Findings
Supports integration of up to 40K KBs into a 7B LLM.
Achieves over 98% Recall@10 in retrieval tasks.
Maintains strong task performance with 99.75% KB compression.
Abstract
This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the models latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
