Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability
Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan

TL;DR
Themis is a flexible, reference-free language model designed for NLG evaluation, trained on a large dataset, providing interpretable assessments and outperforming existing models including GPT-4 across multiple tasks.
Contribution
The paper introduces Themis, a novel LLM for NLG evaluation that is reference-free, flexible, interpretable, and trained on a large, annotated corpus to improve evaluation performance.
Findings
Themis outperforms GPT-4 and other models on various NLG tasks.
It generalizes well to unseen tasks.
It provides interpretable evaluation results.
Abstract
The evaluation of natural language generation (NLG) tasks is a significant and longstanding research area. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the improved performance of existing methods, they still possess some deficiencies, such as dependency on references and limited evaluation flexibility. Therefore, in this paper, we meticulously construct a large-scale NLG evaluation corpus NLG-Eval with annotations from both human and GPT-4 to alleviate the lack of relevant data in this field. Furthermore, we propose Themis, an LLM dedicated to NLG evaluation, which has been trained with our designed multi-perspective consistency verification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Fuzzy Logic and Control Systems · AI-based Problem Solving and Planning
MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer
