Evaluating ChatGPT on Nuclear Domain-Specific Data

Muhammad Anwar; Mischa de Costa; Issam Hammad; Daniel Lau

arXiv:2409.00090·cs.CL·September 4, 2024

Evaluating ChatGPT on Nuclear Domain-Specific Data

Muhammad Anwar, Mischa de Costa, Issam Hammad, Daniel Lau

PDF

Open Access

TL;DR

This study evaluates ChatGPT's effectiveness in answering nuclear domain-specific questions, demonstrating that integrating Retrieval Augmented Generation (RAG) improves accuracy and contextual relevance over standalone responses.

Contribution

It introduces and assesses a RAG-based approach for enhancing LLM performance in specialized, high-stakes domains like nuclear data.

Findings

01

RAG improves answer accuracy in nuclear questions

02

Human and LLM evaluations confirm better relevance with RAG

03

Standalone ChatGPT shows limitations in specialized domains

Abstract

This paper examines the application of ChatGPT, a large language model (LLM), for question-and-answer (Q&A) tasks in the highly specialized field of nuclear data. The primary focus is on evaluating ChatGPT's performance on a curated test dataset, comparing the outcomes of a standalone LLM with those generated through a Retrieval Augmented Generation (RAG) approach. LLMs, despite their recent advancements, are prone to generating incorrect or 'hallucinated' information, which is a significant limitation in applications requiring high accuracy and reliability. This study explores the potential of utilizing RAG in LLMs, a method that integrates external knowledge bases and sophisticated retrieval techniques to enhance the accuracy and relevance of generated outputs. In this context, the paper evaluates ChatGPT's ability to answer domain-specific questions, employing two methodologies: A)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Machine Learning in Healthcare