RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts

Eujeong Choi; Younghun Jeong; Soomin Kim; Won Ik Cho

arXiv:2501.17715·cs.CL·January 30, 2025

RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts

Eujeong Choi, Younghun Jeong, Soomin Kim, Won Ik Cho

PDF

Open Access 1 Repo

TL;DR

RICoTA is a Korean dataset of 609 user prompts designed to test large language models' resilience against jailbreak attempts and in-the-wild interactions, aiding in safer chatbot design.

Contribution

The paper introduces RICoTA, a novel dataset capturing real-world jailbreak and testing prompts from Korean users, to evaluate and improve LLM safety measures.

Findings

01

RICoTA enables effective evaluation of jailbreak resistance.

02

User prompts reveal common testing strategies against LLMs.

03

Dataset supports development of safer conversational AI systems.

Abstract

User interactions with conversational agents (CAs) evolve in the era of heavily guardrailed large language models (LLMs). As users push beyond programmed boundaries to explore and build relationships with these systems, there is a growing concern regarding the potential for unauthorized access or manipulation, commonly referred to as "jailbreaking." Moreover, with CAs that possess highly human-like qualities, users show a tendency toward initiating intimate sexual interactions or attempting to tame their chatbots. To capture and reflect these in-the-wild interactions into chatbot designs, we propose RICoTA, a Korean red teaming dataset that consists of 609 prompts challenging LLMs with in-the-wild user-made dialogues capturing jailbreak attempts. We utilize user-chatbot conversations that were self-posted on a Korean Reddit-like community, containing specific testing and gaming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boychaboy/ricota
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques