Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, and, Maarten Sap

TL;DR
This paper introduces Rel-A.I., an interaction-centered framework for evaluating human reliance on large language models, emphasizing the importance of contextual factors over traditional calibration metrics.
Contribution
The paper presents a novel evaluation method focusing on human reliance behaviors in interactions with LLMs, highlighting the influence of context and communication cues.
Findings
Reliance increases by 10% in calculation-related questions.
Perceived competence boosts reliance by 30%.
Context significantly impacts human-LM reliance behaviors.
Abstract
The ability to communicate uncertainty, risk, and limitation is crucial for the safety of large language models. However, current evaluations of these abilities rely on simple calibration, asking whether the language generated by the model matches appropriate probabilities. Instead, evaluation of this aspect of LLM communication should focus on the behaviors of their human interlocutors: how much do they rely on what the LLM says? Here we introduce an interaction-centered evaluation framework called Rel-A.I. (pronounced "rely"}) that measures whether humans rely on LLM generations. We use this framework to study how reliance is affected by contextual features of the interaction (e.g, the knowledge domain that is being discussed), or the use of greetings communicating warmth or competence (e.g., "I'm happy to help!"). We find that contextual characteristics significantly affect human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComplex Systems and Decision Making · Software Engineering Techniques and Practices · Cognitive Science and Mapping
MethodsFocus
