Teaching Models to Balance Resisting and Accepting Persuasion

Elias Stengel-Eskin; Peter Hase; Mohit Bansal

arXiv:2410.14596·cs.CL·February 11, 2025

Teaching Models to Balance Resisting and Accepting Persuasion

Elias Stengel-Eskin, Peter Hase, Mohit Bansal

PDF

Open Access 1 Repo 3 Models 1 Video

TL;DR

This paper introduces Persuasion-Training (PBT), a method to train large language models to both resist negative persuasion and accept positive persuasion, improving their robustness and collaborative performance.

Contribution

The paper presents PBT, a novel training approach using multi-agent dialogue trees to balance resistance and acceptance of persuasion in large language models.

Findings

01

PBT improves resistance to misinformation and adversarial persuasion.

02

PBT enhances stability and teamwork in multi-agent debates.

03

Models trained with PBT outperform baseline models on holistic persuasion data.

Abstract

Large language models (LLMs) are susceptible to persuasion, which can pose risks when models are faced with an adversarial interlocutor. We take a first step towards defending models against persuasion while also arguing that defense against adversarial (i.e. negative) persuasion is only half of the equation: models should also be able to accept beneficial (i.e. positive) persuasion to improve their answers. We show that optimizing models for only one side results in poor performance on the other. In order to balance positive and negative persuasion, we introduce Persuasion-Training (or PBT), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion when appropriate. PBT allows us to use data generated from dialogues between smaller 7-8B models for training much larger 70B models. Moreover, PBT consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

esteng/persuasion_balanced_training
pytorchOfficial

Models

Videos

Teaching Models to Balance Resisting and Accepting Persuasion· underline

Taxonomy

TopicsSocial Media and Politics · Communication in Education and Healthcare