Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models
Cameron R. Jones, Benjamin K. Bergen

TL;DR
This paper reviews recent empirical and theoretical research on large language models' abilities to persuade and deceive, highlighting risks, mitigation strategies, and open questions for future investigation.
Contribution
It synthesizes current findings on LLMs' persuasive and deceptive capabilities, analyzes potential risks, and discusses mitigation approaches and future research directions.
Findings
Current persuasive effects are relatively small.
Mechanisms like fine-tuning and multimodality could increase impact.
Open questions include the evolution of persuasive AI and mitigation effectiveness.
Abstract
Large Language Models (LLMs) can generate content that is as persuasive as human-written text and appear capable of selectively producing deceptive outputs. These capabilities raise concerns about potential misuse and unintended consequences as these systems become more widely deployed. This review synthesizes recent empirical work examining LLMs' capacity and proclivity for persuasion and deception, analyzes theoretical risks that could arise from these capabilities, and evaluates proposed mitigations. While current persuasive effects are relatively small, various mechanisms could increase their impact, including fine-tuning, multimodality, and social factors. We outline key open questions for future research, including how persuasive AI systems might become, whether truth enjoys an inherent advantage over falsehoods, and how effective different mitigation strategies may be in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
