Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Jiahang He, Rishi Ramachandran, Neel Ramachandran, Aryan Katakam, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Aryan Shrivastava

TL;DR
This paper investigates the robustness of large language models in multi-turn interactions, revealing their answer instability and proposing Markov chain models and linear probes for predicting accuracy changes over turns.
Contribution
It introduces methods to evaluate and predict answer stability in LLMs using Markov chains and linear probes, highlighting their fragility in multi-turn scenarios.
Findings
Answer accuracy drops significantly with repeated prompts.
Markov chains effectively model accuracy dynamics over turns.
Linear probes can predict future answer changes.
Abstract
As large language models (LLMs) are adopted in an increasingly wide range of applications, user-model interactions have grown in both frequency and scale. Consequently, research has focused on evaluating the robustness of LLMs, an essential quality for real-world tasks. In this paper, we employ simple multi-turn follow-up prompts to evaluate models' answer changes, model accuracy dynamics across turns with Markov chains, and examine whether linear probes can predict these changes. Our results show significant vulnerabilities in LLM robustness: a simple "Think again" prompt led to an approximate 10% accuracy drop for Gemini 1.5 Flash over nine turns, while combining this prompt with a semantically equivalent reworded question caused a 7.5% drop for Claude 3.5 Haiku. Additionally, we find that model accuracy across turns can be effectively modeled using Markov chains, enabling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Text Readability and Simplification
