ChatPipe: Orchestrating Data Preparation Program by Optimizing   Human-ChatGPT Interactions

Sibei Chen; Hanbing Liu; Weiting Jin; Xiangyu Sun; Xiaoyao Feng; Ju; Fan; Xiaoyong Du; Nan Tang

arXiv:2304.03540·cs.DB·April 10, 2023·1 cites

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions

Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju, Fan, Xiaoyong Du, Nan Tang

PDF

Open Access

TL;DR

ChatPipe is a system that enhances user interaction with ChatGPT for data preparation in machine learning, enabling easier guidance, version control, and efficient experimentation to improve data quality.

Contribution

It introduces a novel system that optimizes human-ChatGPT interactions for data preparation, including recommendations and version rollback features.

Findings

01

Effective recommendation of next data operations

02

Seamless version control for data programs

03

Rapid orchestration of data preparation workflows

Abstract

Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by interacting with users through natural language prompts, there are still limitations. Specifically, a user must provide specific prompts to iteratively guide ChatGPT in improving data preparation programs, which requires a certain level of expertise in programming, the dataset used and the ML task. Moreover, once a program has been generated, it is non-trivial to revisit a previous version or make changes to the program without starting the process over again. In this paper, we present ChatPipe, a novel system designed to facilitate seamless interaction between users and ChatGPT. ChatPipe provides users with effective recommendation on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Scientific Computing and Data Management · Software Engineering Research