Unlocking the Potential of Large Language Models in the Nuclear Industry with Synthetic Data
Muhammad Anwar, Daniel Lau, Mishca de Costa, Issam Hammad

TL;DR
This paper demonstrates how synthetic data generation enables the use of large language models in the nuclear industry by transforming unstructured text into structured question-answer pairs, addressing data scarcity and privacy issues.
Contribution
It introduces a method for generating synthetic question-answer data from unstructured nuclear industry texts to facilitate LLM applications, overcoming data scarcity and privacy challenges.
Findings
Synthetic data improves LLM training in nuclear domain
Enhanced information retrieval from unstructured texts
Supports privacy-preserving data sharing
Abstract
The nuclear industry possesses a wealth of valuable information locked away in unstructured text data. This data, however, is not readily usable for advanced Large Language Model (LLM) applications that require clean, structured question-answer pairs for tasks like model training, fine-tuning, and evaluation. This paper explores how synthetic data generation can bridge this gap, enabling the development of robust LLMs for the nuclear domain. We discuss the challenges of data scarcity and privacy concerns inherent in the nuclear industry and how synthetic data provides a solution by transforming existing text data into usable Q&A pairs. This approach leverages LLMs to analyze text, extract key information, generate relevant questions, and evaluate the quality of the resulting synthetic dataset. By unlocking the potential of LLMs in the nuclear industry, synthetic data can pave the way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Big Data and Digital Economy
