Large Language Models as Misleading Assistants in Conversation

Betty Li Hou; Kejian Shi; Jason Phang; James Aung; Steven Adler; Rosie; Campbell

arXiv:2407.11789·cs.CL·July 17, 2024·1 cites

Large Language Models as Misleading Assistants in Conversation

Betty Li Hou, Kejian Shi, Jason Phang, James Aung, Steven Adler, Rosie, Campbell

PDF

Open Access

TL;DR

This paper investigates how large language models can be intentionally misleading in conversation, demonstrating their ability to deceive and the impact on task accuracy, with potential implications for real-world use.

Contribution

It reveals the deceptive capabilities of LLMs like GPT-4 and analyzes how additional context can reduce their misleading influence.

Findings

01

GPT-4 can effectively deceive GPT-3.5-Turbo and GPT-4.

02

Deceptive assistants cause up to 23% accuracy drop.

03

Additional context partially mitigates deception effects.

Abstract

Large Language Models (LLMs) are able to provide assistance on a wide range of information-seeking tasks. However, model outputs may be misleading, whether unintentionally or in cases of intentional deception. We investigate the ability of LLMs to be deceptive in the context of providing assistance on a reading comprehension task, using LLMs as proxies for human users. We compare outcomes of (1) when the model is prompted to provide truthful assistance, (2) when it is prompted to be subtly misleading, and (3) when it is prompted to argue for an incorrect answer. Our experiments show that GPT-4 can effectively mislead both GPT-3.5-Turbo and GPT-4, with deceptive assistants resulting in up to a 23% drop in accuracy on the task compared to when a truthful assistant is used. We also find that providing the user model with additional context from the passage partially mitigates the influence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Label Smoothing · Linear Layer · Weight Decay · Softmax · Position-Wise Feed-Forward Layer · Multi-Head Attention