Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts
Sneha Oram, Ojaswita Bhushan, Pushpak Bhattacharyya

TL;DR
This paper analyzes the emotional response consistency of large language models when faced with emotionally charged and false presupposition queries, revealing vulnerabilities especially in moderate emotional contexts.
Contribution
It introduces a novel framework for assessing LLMs' emotional consistency and highlights their susceptibility to false beliefs in emotionally sensitive conversations.
Findings
LLMs perform below average in emotional consistency tasks.
Models are more vulnerable to false beliefs with moderate emotions.
Attention analysis shows a shift from evaluative to generative focus.
Abstract
In this work, we conduct an analysis to examine the consistency of Large Language Models (LLMs) with respect to their own generated responses in an emotionally-driven conversational context. Specifically, the text generated by LLM is framed as a query to the same model, and its responses are subsequently assessed. This is performed with three queries across two dimensions of extreme and moderate emotions. The three queries are, in particular, false claim queries that contain inherently wrong assumptions (false presuppositions) in increasing order of intensity. Two commercial models, Claude-3.5-haiku, GPT4o-mini, and a medium-sized model, Mistral-7B, are considered in the study. Our findings indicate that LLMs exhibit below-average performance and remain vulnerable to false beliefs embedded within queries. This susceptibility is especially pronounced for moderate emotional content.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
