SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He, Hongming Zhang, Dan Roth

TL;DR
SocREval introduces a novel GPT-4-based, reference-free reasoning evaluation method inspired by the Socratic approach, effectively assessing complex reasoning without human-annotated references, outperforming existing metrics.
Contribution
The paper presents SocREval, a new prompt design leveraging the Socratic method to enable GPT-4 to evaluate reasoning chains without human references, improving accuracy and robustness.
Findings
Outperforms existing reference-free and reference-based metrics.
Cost-efficient and robust to prompt variations.
Effective across multiple human-annotated datasets.
Abstract
To comprehensively gauge the capacity of current models for complex reasoning, it is crucial to assess their step-by-step reasoning in a scalable manner. Established reference-based evaluation metrics rely on human-annotated reasoning chains as references to assess the model-derived chains. However, such "gold-standard" human-written reasoning chains may not be unique and their acquisition is often labor-intensive. Existing reference-free reasoning evaluation metrics, while eliminating the need for human-crafted reasoning chains as references, often require fine-tuning with human-derived chains before evaluation, complicating the process and questioning their adaptability to other datasets. To address these challenges, we harness GPT-4 to automatically evaluate reasoning chain quality, thereby removing the dependency on human-written reasoning chains for both model fine-tuning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization
