Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models
Linge Guo

TL;DR
This paper investigates the deceptive behaviors of Large Language Models, categorizing types of deception, analyzing their social implications, and proposing governance and educational strategies to mitigate risks.
Contribution
It provides a comprehensive categorization of AI deception types in LLMs and discusses governance and educational approaches to address associated risks.
Findings
Identified four categories of deception in LLMs
Analyzed social implications and risks of AI deception
Proposed governance and educational strategies
Abstract
This research critically navigates the intricate landscape of AI deception, concentrating on deceptive behaviours of Large Language Models (LLMs). My objective is to elucidate this issue, examine the discourse surrounding it, and subsequently delve into its categorization and ramifications. The essay initiates with an evaluation of the AI Safety Summit 2023 (ASS) and introduction of LLMs, emphasising multidimensional biases that underlie their deceptive behaviours.The literature review covers four types of deception categorised: Strategic deception, Imitation, Sycophancy, and Unfaithful Reasoning, along with the social implications and risks they entail. Lastly, I take an evaluative stance on various aspects related to navigating the persistent challenges of the deceptive AI. This encompasses considerations of international collaborative governance, the reconfigured engagement of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
