Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity
Johannes Schneider, Arianna Casanova Flores, Anne-Catherine Kranz

TL;DR
This paper investigates real-world human-LLM interactions, revealing that humans often provoke toxicity and tend to anthropomorphize LLMs, challenging assumptions about the origins of toxic content.
Contribution
It provides empirical insights into human behaviors in unconstrained LLM interactions and questions current moderation practices, highlighting the human role in toxicity.
Findings
Humans often provoke toxic responses from LLMs.
Users tend to anthropomorphize LLMs, shifting mental models.
Current moderation may overlook human-driven toxicity.
Abstract
This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings in contrast to most prior research focusing on ethically trimmed models like ChatGPT for specific tasks. We aim to understand the originator of toxicity. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content. Our manual analysis of hundreds of conversations judged as toxic by APIs commercial vendors, also raises questions with respect to current practices of what user requests are refused to answer. Furthermore, we conjecture based on multiple empirical indicators that humans exhibit a change of their mental model, switching from the mindset of interacting with a machine more towards interacting with a human.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
