Themis: A Reference-free NLG Evaluation Language Model with Flexibility   and Interpretability

Xinyu Hu; Li Lin; Mingqi Gao; Xunjian Yin; Xiaojun Wan

arXiv:2406.18365·cs.CL·October 10, 2024·1 cites

Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability

Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

Themis is a flexible, reference-free language model designed for NLG evaluation, trained on a large dataset, providing interpretable assessments and outperforming existing models including GPT-4 across multiple tasks.

Contribution

The paper introduces Themis, a novel LLM for NLG evaluation that is reference-free, flexible, interpretable, and trained on a large, annotated corpus to improve evaluation performance.

Findings

01

Themis outperforms GPT-4 and other models on various NLG tasks.

02

It generalizes well to unseen tasks.

03

It provides interpretable evaluation results.

Abstract

The evaluation of natural language generation (NLG) tasks is a significant and longstanding research area. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the improved performance of existing methods, they still possess some deficiencies, such as dependency on references and limited evaluation flexibility. Therefore, in this paper, we meticulously construct a large-scale NLG evaluation corpus NLG-Eval with annotations from both human and GPT-4 to alleviate the lack of relevant data in this field. Furthermore, we propose Themis, an LLM dedicated to NLG evaluation, which has been trained with our designed multi-perspective consistency verification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PKU-ONELab/Themis
noneOfficial

Models

🤗
RichardErkhov/PKU-ONELab_-_Themis-gguf
model· 59 dl
59 dl

Videos

Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability· underline

Taxonomy

TopicsSpeech and dialogue systems · Fuzzy Logic and Control Systems · AI-based Problem Solving and Planning

MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer