Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
Saeid Asgari Taghanaki, Joao Monteiro

TL;DR
This paper introduces Explain-Query-Test (EQT), a self-evaluation pipeline for LLMs that assesses their comprehension by generating explanations, questions, and answers, revealing gaps between explanation quality and reasoning ability.
Contribution
EQT provides a novel self-evaluation method that predicts model performance and uncovers comprehension limitations without external data.
Findings
EQT accuracy correlates with benchmark performance like MMLU-Pro.
Models show a gap between explanation detail and reasoning ability.
EQT can rank models based on their comprehension without external datasets.
Abstract
Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations of complex concepts. However, the extent to which these models truly comprehend the concepts they articulate remains unclear. To assess the level of comprehension of a model relative to the content it generates, we implemented a self-evaluation pipeline where models: (i) given a topic generate an excerpt with information about the topic, (ii) given an excerpt generate question-answer pairs, and finally (iii) given a question generate an answer. We refer to this self-evaluation approach as Explain-Query-Test (EQT). Interestingly, the accuracy on generated questions resulting from running the EQT pipeline correlates strongly with the model performance as verified by typical benchmarks such as MMLU-Pro. In other words, EQT's performance is predictive of MMLU-Pro's, and EQT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
