BeliefShift: Benchmarking Temporal Belief Consistency and Opinion Drift in LLM Agents

Praveen Kumar Myakala; Manan Agrawal; Rahul Manche

arXiv:2603.23848·cs.CL·March 26, 2026

BeliefShift: Benchmarking Temporal Belief Consistency and Opinion Drift in LLM Agents

Praveen Kumar Myakala, Manan Agrawal, Rahul Manche

PDF

Open Access

TL;DR

BeliefShift presents a new benchmark for evaluating how large language models manage belief consistency and opinion changes over multiple sessions, highlighting the trade-offs between personalization and factual grounding.

Contribution

It introduces a longitudinal benchmark with novel metrics to assess belief dynamics in multi-session LLM interactions across diverse domains.

Findings

01

Models balancing personalization and factual grounding exhibit trade-offs.

02

New metrics effectively quantify belief revision and contradiction resolution.

03

Evaluation across multiple models reveals strengths and weaknesses in belief management.

Abstract

LLMs are increasingly used as long-running conversational agents, yet every major benchmark evaluating their memory treats user information as static facts to be stored and retrieved. That's the wrong model. People change their minds, and over extended interactions, phenomena like opinion drift, over-alignment, and confirmation bias start to matter a lot. BeliefShift introduces a longitudinal benchmark designed specifically to evaluate belief dynamics in multi-session LLM interactions. It covers three tracks: Temporal Belief Consistency, Contradiction Detection, and Evidence-Driven Revision. The dataset includes 2,400 human-annotated multi-session interaction trajectories spanning health, politics, personal values, and product preferences. We evaluate seven models including GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, LLaMA-3, and Mistral-Large under zero-shot and retrieval-augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Explainable Artificial Intelligence (XAI)