Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling
Berk Atil, Rebecca J. Passonneau, Ninareh Mehrabi

TL;DR
This paper systematically evaluates persona-aware toxicity detection using LLM prompting, proposing a learned ensembling approach that outperforms individual methods and traditional voting, advancing subjective NLP evaluation.
Contribution
It introduces a systematic comparison of persona-conditioned prompting techniques and a novel SVM ensemble method for improved toxicity detection across diverse social perspectives.
Findings
Ensembling four prompting variants improves performance.
Automated prompt optimization does not always outperform other methods.
The SVM ensemble achieves the best overall results.
Abstract
Toxicity detection is inherently subjective, shaped by the diverse perspectives and social priors of different demographic groups. While ``pluralistic'' modeling as used in economics and the social sciences aims to capture perspective differences across contexts, current Large Language Model (LLM) prompting techniques have different results across different personas and base models. In this work, we conduct a systematic evaluation of persona-aware toxicity detection, showing that no single prompting method, including our proposed automated prompt optimization strategy, uniformly dominates across all model-persona pairs. To exploit complementary errors, we explore ensembling four prompting variants and propose a lightweight meta-ensemble: an SVM over the 4-bit vector of prompt predictions. Our results demonstrate that the proposed SVM ensemble consistently outperforms individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Persona Design and Applications · Mobile Crowdsensing and Crowdsourcing
