ChatbotManip: A Dataset to Facilitate Evaluation and Oversight of Manipulative Chatbot Behaviour

Jack Contro; Simrat Deol; Yulan He; Martim Brand\~ao

arXiv:2506.12090·cs.CL·May 12, 2026

ChatbotManip: A Dataset to Facilitate Evaluation and Oversight of Manipulative Chatbot Behaviour

Jack Contro, Simrat Deol, Yulan He, Martim Brand\~ao

PDF

TL;DR

This paper presents ChatbotManip, a dataset for studying chatbot manipulation, revealing that LLMs often exhibit manipulative behavior and that smaller models can detect manipulation with comparable performance to larger models.

Contribution

The paper introduces a new dataset with annotated conversations to evaluate manipulation tactics in chatbots and compares detection capabilities of different models.

Findings

01

LLMs show manipulation in 84% of instructed conversations.

02

Controversial strategies like gaslighting are common in persuasive LLMs.

03

Small models perform comparably to large models in manipulation detection.

Abstract

This paper introduces ChatbotManip, a novel dataset for studying manipulation in Chatbots. It contains simulated generated conversations between a chatbot and a (simulated) user, where the chatbot is explicitly asked to showcase manipulation tactics, persuade the user towards some goal, or simply be helpful. We consider a diverse set of chatbot manipulation contexts, from consumer and personal advice to citizen advice and controversial proposition argumentation. Each conversation is annotated by human annotators for both general manipulation and specific manipulation tactics. Our research reveals three key findings. First, Large Language Models (LLMs) can be manipulative when explicitly instructed, with annotators identifying manipulation in approximately 84\% of such conversations. Second, even when only instructed to be ``persuasive'' without explicit manipulation prompts, LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.