Local Hybrid Retrieval-Augmented Document QA
Paolo Astrino

TL;DR
This paper introduces a local question-answering system that combines semantic and keyword retrieval to provide accurate, privacy-preserving document QA on sensitive data without internet access.
Contribution
It presents a novel local QA system that balances semantic understanding and keyword precision, enabling secure, high-accuracy document answering on enterprise data.
Findings
Achieves competitive accuracy on legal, scientific, and conversational documents.
Operates entirely on local infrastructure with minimal hardware requirements.
Maintains data privacy while delivering reliable answers.
Abstract
Organizations handling sensitive documents face a critical dilemma: adopt cloud-based AI systems that offer powerful question-answering capabilities but compromise data privacy, or maintain local processing that ensures security but delivers poor accuracy. We present a question-answering system that resolves this trade-off by combining semantic understanding with keyword precision, operating entirely on local infrastructure without internet access. Our approach demonstrates that organizations can achieve competitive accuracy on complex queries across legal, scientific, and conversational documents while keeping all data on their machines. By balancing two complementary retrieval strategies and using consumer-grade hardware acceleration, the system delivers reliable answers with minimal errors, letting banks, hospitals, and law firms adopt conversational document AI without transmitting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Advanced Graph Neural Networks
