Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim, Juyoung Suk, Shayne Longpre, Bill Yuchen Lin, Jamin, Shin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo

TL;DR
Prometheus 2 is an open-source language model designed for evaluating other language models, achieving high correlation with human and GPT-4 judgments across various assessment formats and benchmarks.
Contribution
It introduces a more powerful open evaluator LM capable of both direct and pairwise assessments with customizable criteria, outperforming existing open evaluators.
Findings
Highest correlation with human judgments on benchmarks
Outperforms previous open evaluators in assessment accuracy
Supports flexible evaluation formats and criteria
Abstract
Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability strongly motivate the development of open-source LMs specialized in evaluations. On the other hand, existing open evaluator LMs exhibit critical shortcomings: 1) they issue scores that significantly diverge from those assigned by humans, and 2) they lack the flexibility to perform both direct assessment and pairwise ranking, the two most prevalent forms of assessment. Additionally, they do not possess the ability to evaluate based on custom evaluation criteria, focusing instead on general attributes like helpfulness and harmlessness. To address these issues, we introduce Prometheus 2, a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements. Moreover, it is capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗prometheus-eval/prometheus-7b-v2.0model· 55k dl· ♡ 10355k dl♡ 103
- 🤗prometheus-eval/prometheus-8x7b-v2.0model· 1.1k dl· ♡ 491.1k dl♡ 49
- 🤗AlekseiPravdin/prometheus-7b-v2_0-ggufmodel· 181 dl· ♡ 1181 dl♡ 1
- 🤗vsevolodl/prometheus-8x7b-v2.0-GGUFmodel· 77 dl· ♡ 377 dl♡ 3
- 🤗vsevolodl/prometheus-7b-v2.0-GGUFmodel· 85 dl· ♡ 485 dl♡ 4
- 🤗RichardErkhov/prometheus-eval_-_prometheus-7b-v2.0-4bitsmodel· 2 dl2 dl
- 🤗RichardErkhov/prometheus-eval_-_prometheus-7b-v2.0-ggufmodel· 61 dl61 dl
- 🤗chargoddard/prometheus-2-llama-3-8bmodel· 22 dl· ♡ 222 dl♡ 2
- 🤗thesven/prometheus-7b-v2.0-GPTQmodel· 19 dl19 dl
- 🤗prometheus-eval/prometheus-7b-v2.0-GGUFmodel· 26 dl· ♡ 326 dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Dense Connections · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer
