NotebookRAG: Retrieving Multiple Notebooks to Augment the Generation of EDA Notebooks for Crowd-Wisdom
Yi Shan, Yixuan He, Zekai Shao, Kai Xu, Siming Chen

TL;DR
NotebookRAG leverages retrieved notebooks and user intent to automate and improve exploratory data analysis (EDA) notebook generation, enhancing accuracy and relevance over existing methods.
Contribution
It introduces a retrieval-augmented approach that transforms code cells into executable components, enabling dynamic, data-aware EDA notebook creation.
Findings
Outperforms baseline methods in producing high-quality EDA notebooks.
Retrieval quality is improved by transforming code into executable components.
User study confirms higher relevance and accuracy of generated notebooks.
Abstract
High-quality exploratory data analysis (EDA) is essential in the data science pipeline, but remains highly dependent on analysts' expertise and effort. While recent LLM-based approaches partially reduce this burden, they struggle to generate effective analysis plans and appropriate insights and visualizations when user intent is abstract. Meanwhile, a vast collection of analysis notebooks produced across platforms and organizations contains rich analytical knowledge that can potentially guide automated EDA. Retrieval-augmented generation (RAG) provides a natural way to leverage such corpora, but general methods often treat notebooks as static documents and fail to fully exploit their potential knowledge for automating EDA. To address these limitations, we propose NotebookRAG, a method that takes user intent, datasets, and existing notebooks as input to retrieve, enhance, and reuse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Scientific Computing and Data Management · Computational and Text Analysis Methods
