Semantically Aligned Question and Code Generation for Automated Insight   Generation

Ananya Singha; Bhavya Chopra; Anirudh Khatry; Sumit Gulwani; Austin Z.; Henley; Vu Le; Chris Parnin; Mukul Singh; Gust Verbruggen

arXiv:2405.01556·cs.SE·May 6, 2024

Semantically Aligned Question and Code Generation for Automated Insight Generation

Ananya Singha, Bhavya Chopra, Anirudh Khatry, Sumit Gulwani, Austin Z., Henley, Vu Le, Chris Parnin, Mukul Singh, Gust Verbruggen

PDF

Open Access

TL;DR

This paper presents a method using large language models to generate semantically aligned questions and code for automated insight generation, improving the relevance and diversity of insights for data analysis.

Contribution

It introduces a semantic filtering approach using embeddings to ensure question-code alignment and demonstrates that joint question and code generation enhances diversity.

Findings

01

Embedding-based filtering effectively removes unaligned question-code pairs

02

Joint question and code generation increases diversity of insights

03

Empirical results on Open-WikiTable data validate the approach

Abstract

Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Educational Technology and Assessment