HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific   Insights

Ozan Gokdemir; Carlo Siebenschuh; Alexander Brace; Azton Wells; Brian; Hsu; Kyle Hippe; Priyanka V. Setty; Aswathy Ajith; J. Gregory Pauloski,; Varuni Sastry; Sam Foreman; Huihuo Zheng; Heng Ma; Bharat Kale; Nicholas; Chia; Thomas Gibbs; Michael E. Papka; Thomas Brettin; Francis J. Alexander,; Anima Anandkumar; Ian Foster; Rick Stevens; Venkatram Vishwanath; Arvind; Ramanathan

arXiv:2505.04846·cs.IR·May 9, 2025

HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

Ozan Gokdemir, Carlo Siebenschuh, Alexander Brace, Azton Wells, Brian, Hsu, Kyle Hippe, Priyanka V. Setty, Aswathy Ajith, J. Gregory Pauloski,, Varuni Sastry, Sam Foreman, Huihuo Zheng, Heng Ma, Bharat Kale, Nicholas, Chia, Thomas Gibbs, Michael E. Papka, Thomas Brettin

PDF

TL;DR

HiPerRAG leverages high-performance computing to enable scalable retrieval-augmented generation from over 3.6 million scientific articles, significantly improving scientific question answering accuracy and facilitating cross-disciplinary research.

Contribution

The paper introduces HiPerRAG, a novel HPC-powered RAG workflow with new models Oreo and ColTrast, designed to handle large-scale scientific knowledge retrieval efficiently.

Findings

01

Achieves 90% accuracy on SciQ benchmark

02

Attains 76% accuracy on PubMedQA benchmark

03

Outperforms domain-specific models and GPT-4 in scientific QA

Abstract

The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. To address these issues, we introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Attention Dropout · Softmax · Absolute Position Encodings