DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference

Parisa Rabbani; Priyam Sahoo; Ruben Mathew; Aishee Mondal; Harshita Ketharaman; Nimet Beyza Bozdag; Dilek Hakkani-T\"ur

arXiv:2601.10896·cs.CL·January 19, 2026

DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference

Parisa Rabbani, Priyam Sahoo, Ruben Mathew, Aishee Mondal, Harshita Ketharaman, Nimet Beyza Bozdag, Dilek Hakkani-T\"ur

PDF

Open Access

TL;DR

This paper introduces DialDefer, a framework to detect and reduce framing-induced judgment shifts in LLMs when evaluating dialogue, revealing significant variability depending on how prompts are framed.

Contribution

The paper presents DialDefer and the Dialogic Deference Score to quantify and mitigate framing effects in LLM judgments across multiple domains and models.

Findings

01

Framing causes large shifts in LLM judgments (up to 87pp) while accuracy remains stable.

02

Human-vs-LLM attribution significantly impacts judgment shifts.

03

Mitigation reduces deference but may lead to over-correction into skepticism.

Abstract

LLMs are increasingly used as third-party judges, yet their reliability when evaluating speakers in dialogue remains poorly understood. We show that LLMs judge identical claims differently depending on framing: the same content elicits different verdicts when presented as a statement to verify ("Is this statement correct?") versus attributed to a speaker ("Is this speaker correct?"). We call this dialogic deference and introduce DialDefer, a framework for detecting and mitigating these framing-induced judgment shifts. Our Dialogic Deference Score (DDS) captures directional shifts that aggregate accuracy obscures. Across nine domains, 3k+ instances, and four models, conversational framing induces large shifts (|DDS| up to 87pp, p < .0001) while accuracy remains stable (<2pp), with effects amplifying 2-4x on naturalistic Reddit conversations. Models can shift toward agreement (deference)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Neurobiology of Language and Bilingualism · Reliability and Agreement in Measurement