Unmasking the Shadows of AI: Investigating Deceptive Capabilities in   Large Language Models

Linge Guo

arXiv:2403.09676·cs.CL·March 18, 2024·1 cites

Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models

Linge Guo

PDF

Open Access

TL;DR

This paper investigates the deceptive behaviors of Large Language Models, categorizing types of deception, analyzing their social implications, and proposing governance and educational strategies to mitigate risks.

Contribution

It provides a comprehensive categorization of AI deception types in LLMs and discusses governance and educational approaches to address associated risks.

Findings

01

Identified four categories of deception in LLMs

02

Analyzed social implications and risks of AI deception

03

Proposed governance and educational strategies

Abstract

This research critically navigates the intricate landscape of AI deception, concentrating on deceptive behaviours of Large Language Models (LLMs). My objective is to elucidate this issue, examine the discourse surrounding it, and subsequently delve into its categorization and ramifications. The essay initiates with an evaluation of the AI Safety Summit 2023 (ASS) and introduction of LLMs, emphasising multidimensional biases that underlie their deceptive behaviours.The literature review covers four types of deception categorised: Strategic deception, Imitation, Sycophancy, and Unfaithful Reasoning, along with the social implications and risks they entail. Lastly, I take an evaluative stance on various aspects related to navigating the persistent challenges of the deceptive AI. This encompasses considerations of international collaborative governance, the reconfigured engagement of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)