Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings
Kartik Sharma, Yiqiao Jin, Rakshit Trivedi, Srijan Kumar

TL;DR
This paper introduces PEEK, a method that uses adapted pre-trained embeddings to efficiently probe large language models' factual knowledge, reducing computational costs compared to traditional methods.
Contribution
PEEK leverages pre-trained embedding models to estimate LLM knowledge, enabling scalable and efficient probing without extensive model forward passes.
Findings
Embeddings can predict LLM knowledge with up to 90% accuracy.
Sentence embeddings outperform graph embeddings in predicting LLM knowledge.
Knowledge-adapted embeddings reveal insights into LLMs' internal representations.
Abstract
Large language models (LLMs) acquire knowledge across diverse domains such as science, history, and geography encountered during generative pre-training. However, due to their stochasticity, it is difficult to predict what LLMs have acquired. Prior work has developed different ways to probe this knowledge by investigating the hidden representations, crafting specific task prompts, curating representative samples, and estimating their uncertainty. However, these methods require making forward passes through the underlying model to probe the LLM's knowledge about a specific fact, making them computationally expensive and time-consuming. To bridge this gap, we propose or roxy mbeddings to stimate nowledge of LLMs, by leveraging the pre-trained embedding models that effectively encode factual knowledge as text or graphs as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
