A LLM-Based Ranking Method for the Evaluation of Automatic   Counter-Narrative Generation

Irune Zubiaga; Aitor Soroa; Rodrigo Agerri

arXiv:2406.15227·cs.CL·November 5, 2024

A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation

Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

PDF

Open Access 2 Repos

TL;DR

This paper introduces an LLM-based evaluation method for Counter-Narrative generation that correlates well with human judgment and compares different LLM models' capabilities in generating CNs.

Contribution

It presents a novel ranking evaluation pipeline using pairwise comparisons with LLMs and analyzes various LLMs as zero-shot CN generators, highlighting their strengths and limitations.

Findings

01

High correlation (ρ=0.88) with human preferences.

02

Chat-aligned models excel in zero-shot CN generation.

03

Fine-tuning impacts model performance and responsiveness.

Abstract

This paper proposes a novel approach to evaluate Counter Narrative (CN) generation using a Large Language Model (LLM) as an evaluator. We show that traditional automatic metrics correlate poorly with human judgements and fail to capture the nuanced relationship between generated CNs and human perception. To alleviate this, we introduce a model ranking pipeline based on pairwise comparisons of generated CNs from different models, organized in a tournament-style format. The proposed evaluation method achieves a high correlation with human preference, with a $ρ$ score of 0.88. As an additional contribution, we leverage LLMs as zero-shot CN generators and provide a comparative analysis of chat, instruct, and base models, exploring their respective strengths and limitations. Through meticulous evaluation, including fine-tuning experiments, we elucidate the differences in performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Digital Games and Media · Digital Storytelling and Education

MethodsBalanced Selection