ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions
Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju, Fan, Xiaoyong Du, Nan Tang

TL;DR
ChatPipe is a system that enhances user interaction with ChatGPT for data preparation in machine learning, enabling easier guidance, version control, and efficient experimentation to improve data quality.
Contribution
It introduces a novel system that optimizes human-ChatGPT interactions for data preparation, including recommendations and version rollback features.
Findings
Effective recommendation of next data operations
Seamless version control for data programs
Rapid orchestration of data preparation workflows
Abstract
Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by interacting with users through natural language prompts, there are still limitations. Specifically, a user must provide specific prompts to iteratively guide ChatGPT in improving data preparation programs, which requires a certain level of expertise in programming, the dataset used and the ML task. Moreover, once a program has been generated, it is non-trivial to revisit a previous version or make changes to the program without starting the process over again. In this paper, we present ChatPipe, a novel system designed to facilitate seamless interaction between users and ChatGPT. ChatPipe provides users with effective recommendation on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Scientific Computing and Data Management · Software Engineering Research
