Retrieval-Augmented Question Answering over Scientific Literature for the Electron-Ion Collider
Tina. J. Jat, T. Ghosh, Karthik Suresh

TL;DR
This paper presents a locally-deployed, cost-effective RAG-based Q&A system for the Electron-Ion Collider domain, utilizing an in-house arXiv database and open-source LLaMA models to address domain-specific questions while ensuring data privacy.
Contribution
It introduces a resource-efficient, privacy-preserving RAG system tailored for nuclear physics, extending previous proprietary models with open-source components and local data integration.
Findings
Achieved domain-specific question answering with local data and open-source models.
Ensured data privacy by avoiding external data transmission.
Provided a cost-effective alternative to cloud-based RAG systems.
Abstract
To harness the power of Language Models in answering domain specific specialized technical questions, Retrieval Augmented Generation (RAG) is been used widely. In this work, we have developed a Q\&A application inspired by the Retrieval Augmented Generation (RAG), which is comprised of an in-house database indexed on the arXiv articles related to the Electron-Ion Collider (EIC) experiment - one of the largest international scientific collaboration and incorporated an open-source LLaMA model for answer generation. This is an extension to it's proceeding application built on proprietary model and Cloud-hosted external knowledge-base for the EIC experiment. This locally-deployed RAG-system offers a cost-effective, resource-constraint alternative solution to build a RAG-assisted Q\&A application on answering domain-specific queries in the field of experimental nuclear physics. This set-up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
