Self-rationalization improves LLM as a fine-grained judge

Prapti Trivedi; Aditya Gulati; Oliver Molenschot; Meghana Arakkal; Rajeev; Rajkumar Ramamurthy; Keith Stevens; Tanveesh Singh Chaudhery; Jahnavi; Jambholkar; James Zou; Nazneen Rajani

arXiv:2410.05495·cs.CL·October 10, 2024

Self-rationalization improves LLM as a fine-grained judge

Prapti Trivedi, Aditya Gulati, Oliver Molenschot, Meghana Arakkal, Rajeev, Rajkumar Ramamurthy, Keith Stevens, Tanveesh Singh Chaudhery, Jahnavi, Jambholkar, James Zou, Nazneen Rajani

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Self-Rationalization, an iterative method where judge models improve their scoring and rationales by learning from their own judgments, leading to enhanced evaluation accuracy and rationale quality.

Contribution

The paper proposes Self-Rationalization, a novel iterative fine-tuning approach that enhances LLM judge models' calibration and scoring accuracy through self-generated rationales.

Findings

01

Judge models produce higher quality rationales after two iterations.

02

Models outperform larger SFT-trained models on evaluation benchmarks.

03

Self-Rationalization improves scoring accuracy by 3-9% over existing methods.

Abstract

LLM-as-a-judge models have been used for evaluating both human and AI generated content, specifically by providing scores and rationales. Rationales, in addition to increasing transparency, help models learn to calibrate its judgments. Enhancing a model's rationale can therefore improve its calibration abilities and ultimately the ability to score content. We introduce Self-Rationalization, an iterative process of improving the rationales for the judge models, which consequently improves the score for fine-grained customizable scoring criteria (i.e., likert-scale scoring with arbitrary evaluation criteria). Self-rationalization works by having the model generate multiple judgments with rationales for the same input, curating a preference pair dataset from its own judgements, and iteratively fine-tuning the judge via DPO. Intuitively, this approach allows the judge model to self-improve…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

- Self-refinement of LLMs through RLAIF is a significant research topic. - Generating rationales for LLM judges enhances both performance and interpretability. - The proposed approach refines and improves baseline LLMs.

Weaknesses

**Limited technical novelty** The concept of generating rationales for LLM judges (or reward models) has been explored in recent works [1,2]. Both papers were uploaded to arXiv in August 2024, making them concurrent studies; therefore, it is okay not to include experimental comparisons with them. However, given these existing works, it is essential to identify the unique contributions of this paper. Including a discussion on the pros and cons of different approaches would be valuable. [1] Ankn

Reviewer 02Rating 5Confidence 4

Strengths

- This paper introduces a practical framework for enhancing the judgment capabilities of large language models (LLMs) through Self-Rationalization, an iterative self-training process. - The approach is both practical and resource-efficient, requiring no additional human-labeled data, making it broadly applicable to LLM-based evaluation tasks.

Weaknesses

- The baseline comparisons are insufficient, as the authors should evaluate the proposed method against more iterative self-improvement approaches (e.g., Self-Taught Evaluators) to provide a broader perspective on its effectiveness. - The paper does not convincingly demonstrate the impact of rationales on enhancing judgment capabilities. As shown in Table 3, the performance gain from including rationales is modest (an increase from 72.0 to 74.0 on RewardBench). Additionally, when compared to the

Reviewer 03Rating 6Confidence 4

Strengths

Originality: The paper introduces a new pipeline consisting of data curation and alignment training to enhance LLM's ability as a judge. It builds on existing judge models like Promethus and goes further by implementing RLAIF fine-tuning. Quality: The paper considers multiple benchmarks including Reward Bench, BiGGen Bench, and Feedback Bench, and shows good performance against both comparable and larger models. Significance: overall, the paper has a potential impact for practitioners on the

Weaknesses

Clarity: the paper's presentation can be further improved by including more details. Please see the questions below. Originality: The training dataset and the SFT training method is proposed by Promethus paper, and the pairwise/pointwise scoring functions and fine-grained criteria are also considered in this paper. The contribution of this paper related to method novelty lies on the heuristic used for preference selection.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsDirect Preference Optimization · Sparse Evolutionary Training · Shrink and Fine-Tune