Evaluating ChatGPT on Nuclear Domain-Specific Data
Muhammad Anwar, Mischa de Costa, Issam Hammad, Daniel Lau

TL;DR
This study evaluates ChatGPT's effectiveness in answering nuclear domain-specific questions, demonstrating that integrating Retrieval Augmented Generation (RAG) improves accuracy and contextual relevance over standalone responses.
Contribution
It introduces and assesses a RAG-based approach for enhancing LLM performance in specialized, high-stakes domains like nuclear data.
Findings
RAG improves answer accuracy in nuclear questions
Human and LLM evaluations confirm better relevance with RAG
Standalone ChatGPT shows limitations in specialized domains
Abstract
This paper examines the application of ChatGPT, a large language model (LLM), for question-and-answer (Q&A) tasks in the highly specialized field of nuclear data. The primary focus is on evaluating ChatGPT's performance on a curated test dataset, comparing the outcomes of a standalone LLM with those generated through a Retrieval Augmented Generation (RAG) approach. LLMs, despite their recent advancements, are prone to generating incorrect or 'hallucinated' information, which is a significant limitation in applications requiring high accuracy and reliability. This study explores the potential of utilizing RAG in LLMs, a method that integrates external knowledge bases and sophisticated retrieval techniques to enhance the accuracy and relevance of generated outputs. In this context, the paper evaluates ChatGPT's ability to answer domain-specific questions, employing two methodologies: A)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Machine Learning in Healthcare
