The Accuracy of Domain Specific and Descriptive Analysis Generated by Large Language Models
Denish Omondi Otieno, Faranak Abri, Sima Siami-Namini, Akbar Siami, Namin

TL;DR
This paper evaluates the ability of large language models to perform domain-specific and descriptive data analysis, comparing their accuracy to human analysts in a case study involving phishing emails.
Contribution
It provides an empirical assessment of LLMs' capabilities in statistical and domain-specific analysis tasks, highlighting their strengths and limitations.
Findings
GPT-4 and LangChain excel in numerical reasoning tasks.
LLMs achieve competitive correlation with human judgments on feature engineering.
Struggle to perform well on domain-specific knowledge reasoning.
Abstract
Large language models (LLMs) have attracted considerable attention as they are capable of showcasing impressive capabilities generating comparable high-quality responses to human inputs. LLMs, can not only compose textual scripts such as emails and essays but also executable programming code. Contrary, the automated reasoning capability of these LLMs in performing statistically-driven descriptive analysis, particularly on user-specific data and as personal assistants to users with limited background knowledge in an application domain who would like to carry out basic, as well as advanced statistical and domain-specific analysis is not yet fully explored. More importantly, the performance of these LLMs has not been compared and discussed in detail when domain-specific data analysis tasks are needed. This study, consequently, explores whether LLMs can be used as generative AI-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
