EDAssistant: Supporting Exploratory Data Analysis in Computational Notebooks with In-Situ Code Search and Recommendation
Xingjun Li, Yizhi Zhang, Justin Leung, Chengnian Sun, Jian Zhao

TL;DR
EDAssistant is a JupyterLab extension that enhances exploratory data analysis by providing in-situ code search and API recommendations through interactive visualizations, leveraging machine learning models trained on extensive online notebooks.
Contribution
The paper introduces EDAssistant, a novel tool integrating in-situ search and recommendations in Jupyter notebooks, supported by machine learning, to improve EDA especially for novices.
Findings
User study shows improved EDA efficiency
Participants preferred in-context support over external search
Tool effectively aids novices in understanding datasets
Abstract
Using computational notebooks (e.g., Jupyter Notebook), data scientists rationalize their exploratory data analysis (EDA) based on their prior experience and external knowledge such as online examples. For novices or data scientists who lack specific knowledge about the dataset or problem to investigate, effectively obtaining and understanding the external information is critical to carry out EDA. This paper presents EDAssistant, a JupyterLab extension that supports EDA with in-situ search of example notebooks and recommendation of useful APIs, powered by novel interactive visualization of search results. The code search and recommendation are enabled by state-of-the-art machine learning models, trained on a large corpus of EDA notebooks collected online. A user study is conducted to investigate both EDAssistant and data scientists' current practice (i.e., using external search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Scientific Computing and Data Management · Data Analysis with R
