Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine, Karla Badillo-Urquiola

TL;DR
This paper explores the socio-technical practices of red teaming in large language models through interviews, highlighting how practitioners define, create, and evaluate datasets to assess model risks and suggesting avenues for HCI research to improve these practices.
Contribution
It provides empirical insights into how practitioners conceptualize and develop red teaming datasets, emphasizing the socio-technical aspects often overlooked in technical evaluations.
Findings
Practitioners see red teaming as a socio-technical practice.
Current risk conceptualizations often overlook context and user interaction.
Opportunities identified for HCI to enhance red teaming data practices.
Abstract
Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial Intelligence. However, most existing work emphasizes technical benchmarks and attack success rates, leaving the socio-technical practices of how red teaming datasets are defined, created, and evaluated under-examined. Drawing on 22 interviews with practitioners who design and evaluate red teaming datasets, we examine the data practices and standards that underpin this work. Because adversarial datasets determine the scope and accuracy of model evaluations, they are critical artifacts for assessing potential harms from large language models. Our contributions are first, empirical evidence of practitioners conceptualizing red teaming and developing and evaluating red teaming datasets. Second, we reflect on how practitioners' conceptualization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Ethics and Social Impacts of AI · Hate Speech and Cyberbullying Detection
