Is GPT-4 a Good Data Analyst?
Liying Cheng, Xingxuan Li, Lidong Bing

TL;DR
This study evaluates GPT-4's effectiveness as a data analyst by comparing its performance to human analysts across various data tasks, using a systematic framework and specific metrics.
Contribution
The paper introduces a comprehensive framework and evaluation metrics for assessing GPT-4's data analysis capabilities compared to professional humans.
Findings
GPT-4 achieves performance comparable to human data analysts.
The study provides insights into GPT-4's strengths and limitations in data analysis.
Results suggest potential for GPT-4 to assist or augment data analysis tasks.
Abstract
As large language models (LLMs) have demonstrated their powerful capabilities in plenty of domains and tasks, including context understanding, code generation, language generation, data storytelling, etc., many data analysts may raise concerns if their jobs will be replaced by artificial intelligence (AI). This controversial topic has drawn great attention in public. However, we are still at a stage of divergent opinions without any definitive conclusion. Motivated by this, we raise the research question of "is GPT-4 a good data analyst?" in this work and aim to answer it by conducting head-to-head comparative studies. In detail, we regard GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We propose a framework to tackle the problems by carefully designing the prompts for GPT-4 to conduct experiments. We also design several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Data Quality and Management
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Dense Connections · Residual Connection
