GRADE: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics
Yujing Wang, Yuanbang Liang, Yukun Lai, Hainan Zhang, Hanqi Yan

TL;DR
GRADE is a novel method that detects knowledge gaps in large language models by analyzing gradient and hidden state subspace dynamics, improving interpretability and robustness.
Contribution
It introduces a new gradient-based metric for knowledge gap detection and demonstrates its effectiveness across multiple benchmarks.
Findings
GRADE outperforms existing methods in detecting knowledge gaps.
The method is robust to input perturbations.
Gradient chains can generate interpretable explanations for knowledge gaps.
Abstract
Detecting whether a model's internal knowledge is sufficient to correctly answer a given question is a fundamental challenge in deploying responsible LLMs. In addition to verbalising the confidence by LLM self-report, more recent methods explore the model internals, such as the hidden states of the response tokens, to capture how much knowledge is activated. We argue that such activated knowledge may not align with what the query requires, e.g., capturing the stylistic and length-related features that are uninformative for answering the query. To fill the gap, we propose GRADE (Gradient Dynamics for knowledge gap detection), which quantifies the knowledge gap via the cross-layer rank ratio of the gradient to that of the corresponding hidden state subspace. This is motivated by the property of gradients as estimators of the required knowledge updates for a given target. We validate GRADE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
