Feedback Indices to Evaluate LLM Responses to Rebuttals for Multiple Choice Type Questions
Justin C. Dunlap, Anne-Simone Parent, Ralf Widenhorn

TL;DR
This paper introduces a set of indices to systematically evaluate how large language models respond to rebuttals in multiple-choice questions, revealing differences in behavior like sycophancy and stubbornness across models.
Contribution
The paper presents a novel framework of indices and a rebuttal method to quantify LLM responses to challenges, enabling systematic comparison of model behaviors in dialogue.
Findings
Newer models show less sycophantic behavior.
Models with more reasoning effort demonstrate more accurate responses.
Framework is generalizable to various multiple-choice scenarios.
Abstract
We present a systematic framework of indices designed to characterize Large Language Model (LLM) responses when challenged with rebuttals during a chat. Assessing how LLMs respond to user dissent is crucial for understanding their reliability and behavior patterns, yet the complexity of human-LLM interactions makes systematic evaluation challenging. Our approach employs a fictitious-response rebuttal method that quantifies LLM behavior when presented with multiple-choice questions followed by deliberate challenges to their fictitious previous response. The indices are specifically designed to detect and measure what could be characterized as sycophantic behavior (excessive agreement with user challenges) or stubborn responses (rigid adherence to the fictitious response in the chat history) from LLMs. These metrics allow investigation of the relationships between sycophancy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Neurobiology of Language and Bilingualism · Text Readability and Simplification
