DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models
Alexander Sheppert

TL;DR
DepthCharge is a versatile framework that measures how deeply large language models can sustain accurate, domain-specific knowledge through adaptive questioning and verification, revealing performance variations hidden by standard benchmarks.
Contribution
It introduces a domain-agnostic, adaptive probing framework for measuring depth of knowledge in LLMs without domain-specific data or expertise, enabling comparative evaluation across diverse fields.
Findings
DepthCharge reveals significant variation in knowledge depth across models and domains.
Expected Valid Depth varies from 3.45 to 7.55 across model-domain pairs.
Cost-performance analysis shows expensive models do not always have deeper knowledge.
Abstract
Large Language Models appear competent when answering general questions but often fail when pushed into domain-specific details. No existing methodology provides an out-of-the-box solution for measuring how deeply LLMs can sustain accurate responses under adaptive follow-up questioning across arbitrary domains. We present DepthCharge, a domain-agnostic framework that measures knowledge depth through three innovations: adaptive probing that generates follow-up questions based on concepts the model actually mentions, on-demand fact verification from authoritative sources, and survival statistics with constant sample sizes at every depth level. The framework can be deployed on any knowledge domain with publicly verifiable facts, without requiring pre-constructed test sets or domain-specific expertise. DepthCharge results are relative to the evaluator model used for answer checking,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education
