Evaluating the Efficacy of ChatGPT-4 in Providing Scientific References across Diverse Disciplines
Zhi Cao

TL;DR
This study assesses ChatGPT-4's ability to provide accurate scientific references across various disciplines, revealing significant performance variability and highlighting the need for model improvements and human validation in academic research.
Contribution
It offers a comprehensive evaluation of ChatGPT-4's reference accuracy across multiple fields, identifying strengths and limitations for scholarly use.
Findings
Higher validity in CS, BME, and Medicine references (>65%)
No verified articles in ME and EE fields
References tend to match broader themes rather than specific topics
Abstract
This work conducts a comprehensive exploration into the proficiency of OpenAI's ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4's performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling
