A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation
Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

TL;DR
This paper introduces an LLM-based evaluation method for Counter-Narrative generation that correlates well with human judgment and compares different LLM models' capabilities in generating CNs.
Contribution
It presents a novel ranking evaluation pipeline using pairwise comparisons with LLMs and analyzes various LLMs as zero-shot CN generators, highlighting their strengths and limitations.
Findings
High correlation (ρ=0.88) with human preferences.
Chat-aligned models excel in zero-shot CN generation.
Fine-tuning impacts model performance and responsiveness.
Abstract
This paper proposes a novel approach to evaluate Counter Narrative (CN) generation using a Large Language Model (LLM) as an evaluator. We show that traditional automatic metrics correlate poorly with human judgements and fail to capture the nuanced relationship between generated CNs and human perception. To alleviate this, we introduce a model ranking pipeline based on pairwise comparisons of generated CNs from different models, organized in a tournament-style format. The proposed evaluation method achieves a high correlation with human preference, with a score of 0.88. As an additional contribution, we leverage LLMs as zero-shot CN generators and provide a comparative analysis of chat, instruct, and base models, exploring their respective strengths and limitations. Through meticulous evaluation, including fine-tuning experiments, we elucidate the differences in performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Digital Games and Media · Digital Storytelling and Education
MethodsBalanced Selection
