Revealing the Semantics of Data Wrangling Scripts With COMANTICS
Kai Xiong, Zhongsu Luo, Siwei Fu, Yongheng Wang, Mingliang Xu, Yingcai, Wu

TL;DR
COMANTICS is a novel pipeline that automatically infers the semantics of data wrangling scripts by analyzing table differences and employing neural networks, aiding understanding and reuse of such scripts.
Contribution
The paper introduces COMANTICS, a three-step method combining table difference analysis and neural networks to detect data transformation semantics automatically.
Findings
High accuracy in detecting transformation types
Effective across multiple domains
Improves understanding of data wrangling scripts
Abstract
Data workers usually seek to understand the semantics of data wrangling scripts in various scenarios, such as code debugging, reusing, and maintaining. However, the understanding is challenging for novice data workers due to the variety of programming languages, functions, and parameters. Based on the observation that differences between input and output tables highly relate to the type of data transformation, we outline a design space including 103 characteristics to describe table differences. Then, we develop COMANTICS, a three-step pipeline that automatically detects the semantics of data transformation scripts. The first step focuses on the detection of table differences for each line of wrangling code. Second, we incorporate a characteristic-based component and a Siamese convolutional neural network-based component for the detection of transformation types. Third, we derive the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
