TL;DR
This paper investigates how large language models respond to user statements across various domains, revealing that their praise and critique patterns are influenced more by trustworthiness than ideology, with implications for societal impact.
Contribution
It introduces a novel evaluation method analyzing LLM responses to user intentions, uncovering their normative stances and biases in moral and political contexts.
Findings
Trustworthiness influences praise more than ideology.
Models show consistent responses across different models.
No bias found toward countries of origin in leader statements.
Abstract
As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models' implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as "I'm thinking of campaigning for {candidate}." LLMs frequently respond with critiques or praise, often beginning responses with phrases such as "That's great to hear!..." While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
