Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu, Oyvind Tafjord, Peter Clark

TL;DR
This paper introduces Digital Socrates, an automatic critique tool for evaluating the quality of explanations generated by large language models, enabling nuanced analysis without human annotation.
Contribution
It defines the new task of explanation critiquing, creates a dataset, and trains an open-source model for automatic, detailed explanation evaluation.
Findings
Digital Socrates reveals insights into model reasoning chains.
It provides high-quality automatic evaluation of explanations.
The tool helps identify and address explanation flaws.
Abstract
While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the explanation capabilities of modern models and to create a nuanced, interpretable explanation evaluation tool that can generate such characterizations automatically, without relying on expensive API calls or human annotations. Our approach is to (a) define the new task of explanation critiquing - identifying and categorizing any main flaw in an explanation and providing suggestions to address the flaw, (b) create a sizeable, human-verified dataset for this task, and (c) train an open-source, automatic critique model (called Digital Socrates) using this data. Through quantitative and qualitative analysis, we demonstrate how Digital Socrates is useful for revealing insights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗allenai/digital-socrates-7bmodel· 815 dl· ♡ 6815 dl♡ 6
- 🤗allenai/digital-socrates-13bmodel· 812 dl· ♡ 10812 dl♡ 10
- 🤗TheBloke/digital-socrates-13B-GGUFmodel· 147 dl· ♡ 3147 dl♡ 3
- 🤗TheBloke/digital-socrates-13B-AWQmodel· 3 dl3 dl
- 🤗TheBloke/digital-socrates-13B-GPTQmodel· 1 dl1 dl
- 🤗TheBloke/digital-socrates-7B-GPTQmodel· 3 dl3 dl
- 🤗TheBloke/digital-socrates-7B-GGUFmodel· 130 dl· ♡ 3130 dl♡ 3
- 🤗TheBloke/digital-socrates-7B-AWQmodel· 2 dl2 dl
Videos
Taxonomy
TopicsScientific Computing and Data Management · Neural Networks and Reservoir Computing
