Query pipeline optimization for cancer patient question answering systems
Maolin He, Rena Gao, Mike Conway, Brian E. Chapman

TL;DR
This paper introduces a three-aspect optimization framework for RAG query pipelines in cancer patient QA systems, enhancing retrieval and understanding using biomedical databases.
Contribution
It presents a novel domain-specific optimization approach with new retrieval and semantic techniques tailored for cancer-related question answering.
Findings
Optimized RAG improved answer accuracy by 5.24% over chain-of-thought prompting.
Hybrid Semantic Real-time Document Retrieval (HSRDR) outperformed existing document retrieval methods.
Semantic Enhanced Overlap Segmentation (SEOS) enhanced contextual understanding in biomedical QA.
Abstract
Retrieval-augmented generation (RAG) mitigates hallucination in Large Language Models (LLMs) by using query pipelines to retrieve relevant external information and grounding responses in retrieved knowledge. However, query pipeline optimization for cancer patient question-answering (CPQA) systems requires separately optimizing multiple components with domain-specific considerations. We propose a novel three-aspect optimization approach for the RAG query pipeline in CPQA systems, utilizing public biomedical databases like PubMed and PubMed Central. Our optimization includes: (1) document retrieval, utilizing a comparative analysis of NCBI resources and introducing Hybrid Semantic Real-time Document Retrieval (HSRDR); (2) passage retrieval, identifying optimal pairings of dense retrievers and rerankers; and (3) semantic representation, introducing Semantic Enhanced Overlap Segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
