TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin,, Wenhu Chen

TL;DR
TIGERScore is an instruction-guided, explainable, reference-free evaluation metric for text generation that correlates strongly with human judgments and provides error analysis across diverse tasks.
Contribution
It introduces TIGERScore, a trained, instruction-guided metric that offers explainable, reference-free evaluation for all text generation tasks, trained on a large, diverse dataset.
Findings
Achieves state-of-the-art correlation with human ratings
Provides accurate, human-like error explanations
Surpasses existing reference-based metrics in correlation
Abstract
We present TIGERScore, a \textbf{T}rained metric that follows \textbf{I}nstruction \textbf{G}uidance to perform \textbf{E}xplainable, and \textbf{R}eference-free evaluation over a wide spectrum of text generation tasks. Different from other automatic evaluation methods that only provide arcane scores, TIGERScore is guided by natural language instruction to provide error analysis to pinpoint the mistakes in the generated text. Our metric is based on LLaMA-2, trained on our meticulously curated instruction-tuning dataset MetricInstruct which covers 6 text generation tasks and 23 text generation datasets. The dataset consists of 42K quadruple in the form of (instruction, input, system output error analysis). We collected the `system outputs' through from a large variety of models to cover different types of errors. To quantitatively assess our metric, we evaluate its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗TIGER-Lab/TIGERScore-7Bmodel· 32 dl· ♡ 232 dl♡ 2
- 🤗TIGER-Lab/TIGERScore-13Bmodel· 747 dl· ♡ 18747 dl♡ 18
- 🤗LoneStriker/TIGERScore-7B-3.0bpw-h6-exl2model· 3 dl3 dl
- 🤗LoneStriker/TIGERScore-7B-4.0bpw-h6-exl2model· 1 dl1 dl
- 🤗LoneStriker/TIGERScore-7B-5.0bpw-h6-exl2model· 2 dl2 dl
- 🤗LoneStriker/TIGERScore-7B-6.0bpw-h6-exl2model· 1 dl1 dl
- 🤗LoneStriker/TIGERScore-7B-8.0bpw-h8-exl2model· 1 dl1 dl
- 🤗LoneStriker/TIGERScore-13B-3.0bpw-h6-exl2model· 1 dl1 dl
- 🤗LoneStriker/TIGERScore-13B-4.0bpw-h6-exl2model· 1 dl1 dl
- 🤗LoneStriker/TIGERScore-13B-5.0bpw-h6-exl2model· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Absolute Position Encodings · Softmax · Layer Normalization · Adam
