Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained   LLMs with RAG

Chenhao Fang; Derek Larson; Shitong Zhu; Sophie Zeng; Wendy Summer,; Yanqing Peng; Yuriy Hulovatyy; Rajeev Rao; Gabriel Forgues; Arya Pudota; Alex; Goncalves; Herv\'e Robert

arXiv:2410.02825·cs.CL·October 15, 2024

Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG

Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer,, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex, Goncalves, Herv\'e Robert

PDF

Open Access

TL;DR

This paper introduces a method combining continual pre-training with a privacy-specific knowledge base and a semantic RAG layer to significantly reduce hallucinations in LLMs, improving accuracy on privacy-related queries.

Contribution

It proposes a novel approach that enhances LLM factual grounding and reduces hallucinations using continual pre-training and RAG, specifically for privacy-related tasks.

Findings

01

Model performance doubled on privacy queries

02

Grounded responses reduce hallucinations

03

Enhanced factual accuracy in LLMs

Abstract

This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Traditional Chinese Medicine Studies · Big Data and Digital Economy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Attention Dropout · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Dropout · Byte Pair Encoding · BERT