eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs
Isaac Shi, Zeyuan Li, Fan Liu, Wenli Wang, Lewei He, Yang Yang, Tianyu Shi

TL;DR
The paper introduces the DEREK module, a secure, scalable retrieval-augmented generation system for enterprise document question answering, combining advanced retrieval, reranking, and verification to ensure accurate and traceable answers.
Contribution
It presents a novel, end-to-end pipeline integrating heterogeneous content ingestion, hybrid retrieval, reranking, and a verifier for enterprise-grade document QA with minimal operational overhead.
Findings
Improved Recall@50 by ~1 percentage point on LegalBench
Boosted Precision@10 by ~7 percentage points with hybrid+rerank
Verifier increased TRACe Utilization above 0.50 and reduced unsupported statements below 3%
Abstract
We present the DEREK (Deep Extraction & Reasoning Engine for Knowledge) Module, a secure and scalable Retrieval-Augmented Generation pipeline designed specifically for enterprise document question answering. Designed and implemented by eSapiens, the system ingests heterogeneous content (PDF, Office, web), splits it into 1,000-token overlapping chunks, and indexes them in a hybrid HNSW+BM25 store. User queries are refined by GPT-4o, retrieved via combined vector+BM25 search, reranked with Cohere, and answered by an LLM using CO-STAR prompt engineering. A LangGraph verifier enforces citation overlap, regenerating answers until every claim is grounded. On four LegalBench subsets, 1000-token chunks improve Recall@50 by approximately 1 pp and hybrid+rerank boosts Precision@10 by approximately 7 pp; the verifier raises TRACe Utilization above 0.50 and limits unsupported statements to less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing
