Explain-Query-Test: Self-Evaluating LLMs Via Explanation and   Comprehension Discrepancy

Saeid Asgari Taghanaki; Joao Monteiro

arXiv:2501.11721·cs.CL·March 11, 2025

Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy

Saeid Asgari Taghanaki, Joao Monteiro

PDF

Open Access 1 Repo

TL;DR

This paper introduces Explain-Query-Test (EQT), a self-evaluation pipeline for LLMs that assesses their comprehension by generating explanations, questions, and answers, revealing gaps between explanation quality and reasoning ability.

Contribution

EQT provides a novel self-evaluation method that predicts model performance and uncovers comprehension limitations without external data.

Findings

01

EQT accuracy correlates with benchmark performance like MMLU-Pro.

02

Models show a gap between explanation detail and reasoning ability.

03

EQT can rank models based on their comprehension without external datasets.

Abstract

Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations of complex concepts. However, the extent to which these models truly comprehend the concepts they articulate remains unclear. To assess the level of comprehension of a model relative to the content it generates, we implemented a self-evaluation pipeline where models: (i) given a topic generate an excerpt with information about the topic, (ii) given an excerpt generate question-answer pairs, and finally (iii) given a question generate an answer. We refer to this self-evaluation approach as Explain-Query-Test (EQT). Interestingly, the accuracy on generated questions resulting from running the EQT pipeline correlates strongly with the model performance as verified by typical benchmarks such as MMLU-Pro. In other words, EQT's performance is predictive of MMLU-Pro's, and EQT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asgsaeid/eqt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)