PIPPA: A Partially Synthetic Conversational Dataset

Tear Gosling; Alpin Dale; Yinhe Zheng

arXiv:2308.05884·cs.CL·August 14, 2023

PIPPA: A Partially Synthetic Conversational Dataset

Tear Gosling, Alpin Dale, Yinhe Zheng

PDF

Open Access 10 Models 5 Datasets

TL;DR

PIPPA is a large, partially-synthetic conversational dataset created through crowdsourcing, designed to improve role-play and casual conversation models by providing diverse, nuanced interactions for AI training.

Contribution

The paper introduces PIPPA, a novel large-scale dataset of over 1 million utterances from role-play scenarios, developed via community-driven crowdsourcing efforts.

Findings

01

Contains over 1 million utterances across 26,000 sessions

02

Enhances resources for training conversational AI in role-play contexts

03

Facilitates exploration of nuanced human-AI interactions

Abstract

With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions

Methodsfail