Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System
Hsiang-Wei Huang, Junbin Lu, Kuang-Ming Chen, Jenq-Neng Hwang

TL;DR
This paper investigates how Large Language Model (LLM) agent reviewers behave in an Elo-ranked conference review system, revealing that Elo ratings enhance decision accuracy and influence reviewer strategies without increasing effort.
Contribution
It introduces a simulation framework for LLM reviewer dynamics in Elo-ranked review systems, demonstrating the impact of Elo on decision accuracy and reviewer behavior.
Findings
Elo improves Area Chair decision accuracy.
Reviewers adapt strategies to exploit Elo ratings.
Elo does not increase review effort.
Abstract
In this work, we explore the Large Language Model (LLM) agent reviewer dynamics in an Elo-ranked review system using real-world conference paper submissions. Multiple LLM agent reviewers with different personas are engage in multi round review interactions moderated by an Area Chair. We compare a baseline setting with conditions that incorporate Elo ratings and reviewer memory. Our simulation results showcase several interesting findings, including how incorporating Elo improves Area Chair decision accuracy, as well as reviewers' adaptive review strategy that exploits our Elo system without improving review effort. Our code is available at https://github.com/hsiangwei0903/EloReview.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Sentiment Analysis and Opinion Mining
