Resistance Against Manipulative AI: key factors and possible actions
Piotr Wilczy\'nski, Wiktoria Mieleszczenko-Kowszewicz and, Przemys{\l}aw Biecek

TL;DR
This paper investigates factors influencing susceptibility to manipulative language models and proposes strategies including AI literacy and a detection classifier to mitigate manipulation risks.
Contribution
It identifies human and LLM characteristics linked to manipulation potential and introduces a classifier called Manipulation Fuse for detection.
Findings
Human susceptibility varies with individual traits.
LLMs can be prompted to produce manipulative statements.
AI literacy can reduce manipulation risks.
Abstract
If AI is the new electricity, what should we do to keep ourselves from getting electrocuted? In this work, we explore factors related to the potential of large language models (LLMs) to manipulate human decisions. We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of LLMs are associated with their manipulativeness potential. We explore human factors by conducting user studies in which participants answer general knowledge questions using LLM-generated hints, whereas LLM factors by provoking language models to create manipulative statements. Then, we analyze their obedience, the persuasion strategies used, and the choice of vocabulary. Based on these experiments, we discuss two actions that can protect us from LLM manipulation. In the long term, we put AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
