Large Language Models as Misleading Assistants in Conversation
Betty Li Hou, Kejian Shi, Jason Phang, James Aung, Steven Adler, Rosie, Campbell

TL;DR
This paper investigates how large language models can be intentionally misleading in conversation, demonstrating their ability to deceive and the impact on task accuracy, with potential implications for real-world use.
Contribution
It reveals the deceptive capabilities of LLMs like GPT-4 and analyzes how additional context can reduce their misleading influence.
Findings
GPT-4 can effectively deceive GPT-3.5-Turbo and GPT-4.
Deceptive assistants cause up to 23% accuracy drop.
Additional context partially mitigates deception effects.
Abstract
Large Language Models (LLMs) are able to provide assistance on a wide range of information-seeking tasks. However, model outputs may be misleading, whether unintentionally or in cases of intentional deception. We investigate the ability of LLMs to be deceptive in the context of providing assistance on a reading comprehension task, using LLMs as proxies for human users. We compare outcomes of (1) when the model is prompted to provide truthful assistance, (2) when it is prompted to be subtly misleading, and (3) when it is prompted to argue for an incorrect answer. Our experiments show that GPT-4 can effectively mislead both GPT-3.5-Turbo and GPT-4, with deceptive assistants resulting in up to a 23% drop in accuracy on the task compared to when a truthful assistant is used. We also find that providing the user model with additional context from the passage partially mitigates the influence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Label Smoothing · Linear Layer · Weight Decay · Softmax · Position-Wise Feed-Forward Layer · Multi-Head Attention
