GPT in Data Science: A Practical Exploration of Model Selection
Nathalia Nascimento, Cristina Tavares, Paulo Alencar, Donald Cowan

TL;DR
This paper explores how GPT-4 can assist in data science model selection, analyzing its decision-making process, effectiveness, and transparency through experiments with toy datasets and comparison with other heuristics.
Contribution
It provides a detailed analysis of GPT-4's model selection heuristics and introduces a variability model to understand AI decision-making in data science.
Findings
GPT-4's model selection heuristics differ from other platforms.
The variability model helps visualize factors influencing GPT-4's decisions.
Experimental results evaluate the effectiveness of GPT-4's recommendations.
Abstract
There is an increasing interest in leveraging Large Language Models (LLMs) for managing structured data and enhancing data science processes. Despite the potential benefits, this integration poses significant questions regarding their reliability and decision-making methodologies. It highlights the importance of various factors in the model selection process, including the nature of the data, problem type, performance metrics, computational resources, interpretability vs accuracy, assumptions about data, and ethical considerations. Our objective is to elucidate and express the factors and assumptions guiding GPT-4's model selection recommendations. We employ a variability model to depict these factors and use toy datasets to evaluate both the model and the implementation of the identified heuristics. By contrasting these outcomes with heuristics from other platforms, our aim is to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Data Quality and Management
