PIPPA: A Partially Synthetic Conversational Dataset
Tear Gosling, Alpin Dale, Yinhe Zheng

TL;DR
PIPPA is a large, partially-synthetic conversational dataset created through crowdsourcing, designed to improve role-play and casual conversation models by providing diverse, nuanced interactions for AI training.
Contribution
The paper introduces PIPPA, a novel large-scale dataset of over 1 million utterances from role-play scenarios, developed via community-driven crowdsourcing efforts.
Findings
Contains over 1 million utterances across 26,000 sessions
Enhances resources for training conversational AI in role-play contexts
Facilitates exploration of nuanced human-AI interactions
Abstract
With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗DS-Archive/mistral-v0.1-7b-pippa-metharme-loramodel· 2 dl· ♡ 42 dl♡ 4
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.1-4B-QLoRAmodel· 1 dl1 dl
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.1-4Bmodel· 1 dl1 dl
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.2-4B-QLoRAmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.2-4Bmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗PJMixers-Dev/Granite-3.1-Earthen-v0.3-3B-A800M-QLoRAmodel· 1 dl1 dl
- 🤗PJMixers-Dev/Granite-3.1-Earthen-v0.3-3B-A800Mmodel· 7 dl7 dl
- 🤗PJMixers-Dev/Granite-3.1-Earthen-v0.3-3B-A800M-GGUFmodel· 177 dl177 dl
- 🤗PJMixers-Dev/Granite-3.1-Earthen-v0.3-1B-A400M-QLoRAmodel· 3 dl3 dl
- 🤗PJMixers-Dev/Granite-3.1-Earthen-v0.3-1B-A400Mmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions
Methodsfail
