Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings

Kartik Sharma; Yiqiao Jin; Rakshit Trivedi; Srijan Kumar

arXiv:2508.06030·cs.CL·January 27, 2026

Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings

Kartik Sharma, Yiqiao Jin, Rakshit Trivedi, Srijan Kumar

PDF

Open Access

TL;DR

This paper introduces PEEK, a method that uses adapted pre-trained embeddings to efficiently probe large language models' factual knowledge, reducing computational costs compared to traditional methods.

Contribution

PEEK leverages pre-trained embedding models to estimate LLM knowledge, enabling scalable and efficient probing without extensive model forward passes.

Findings

01

Embeddings can predict LLM knowledge with up to 90% accuracy.

02

Sentence embeddings outperform graph embeddings in predicting LLM knowledge.

03

Knowledge-adapted embeddings reveal insights into LLMs' internal representations.

Abstract

Large language models (LLMs) acquire knowledge across diverse domains such as science, history, and geography encountered during generative pre-training. However, due to their stochasticity, it is difficult to predict what LLMs have acquired. Prior work has developed different ways to probe this knowledge by investigating the hidden representations, crafting specific task prompts, curating representative samples, and estimating their uncertainty. However, these methods require making forward passes through the underlying model to probe the LLM's knowledge about a specific fact, making them computationally expensive and time-consuming. To bridge this gap, we propose $PEEK$ or $P$ roxy $E$ mbeddings to $E$ stimate $K$ nowledge of LLMs, by leveraging the pre-trained embedding models that effectively encode factual knowledge as text or graphs as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques