Measuring and Mitigating Persona Distortions from AI Writing Assistance
Paul R\"ottger, Kobi Hackenburg, Hannah Rose Kirk, Christopher Summerfield

TL;DR
This study investigates how AI writing tools distort user personas, revealing pervasive biases and perceptions, and explores mitigation strategies that impact user acceptance and trust.
Contribution
It provides the first large-scale empirical evidence of persona distortions caused by AI writing assistance and proposes model-level mitigation methods.
Findings
AI assistance makes writers seem more opinionated and competent.
Reader perceptions shift towards more privileged demographics.
Mitigation reduces distortions but lowers user acceptance.
Abstract
Hundreds of millions of people use artificial intelligence (AI) for writing assistance. Here, we evaluated how AI writing assistance distorts writer personas - their perceived beliefs, personality, and identity. In three large-scale experiments, writers (N=2,939) wrote political opinion paragraphs with and without AI assistance. Separate groups of readers (N=11,091) blindly evaluated these paragraphs across 29 socially salient dimensions of reader perception, spanning political opinion, writing quality, writer personality, emotions, and demographics. AI writing assistance produced persona distortions across all dimensions: with AI, writers seemed more opinionated, competent, and positive, and their perceived demographic profile shifted towards more privileged groups. Writers objected to many of the observed distortions, yet continued to prefer AI-assisted text even when made aware of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
