Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG
Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer,, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex, Goncalves, Herv\'e Robert

TL;DR
This paper introduces a method combining continual pre-training with a privacy-specific knowledge base and a semantic RAG layer to significantly reduce hallucinations in LLMs, improving accuracy on privacy-related queries.
Contribution
It proposes a novel approach that enhances LLM factual grounding and reduces hallucinations using continual pre-training and RAG, specifically for privacy-related tasks.
Findings
Model performance doubled on privacy queries
Grounded responses reduce hallucinations
Enhanced factual accuracy in LLMs
Abstract
This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Traditional Chinese Medicine Studies · Big Data and Digital Economy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Attention Dropout · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Dropout · Byte Pair Encoding · BERT
