HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis
Alejandro Godinez

TL;DR
HySemRAG is a comprehensive framework that automates large-scale literature synthesis and gap analysis by integrating advanced retrieval, document processing, and knowledge graph techniques, improving accuracy and traceability in scientific review.
Contribution
The paper introduces HySemRAG, a novel hybrid retrieval-augmented generation system with multi-layered retrieval, agentic self-correction, and knowledge graph integration for automated literature synthesis.
Findings
Semantic field extraction improved by 35.1% in similarity scores.
Agentic quality assurance achieved 68.3% success rate in single-pass responses.
System demonstrated effective gap analysis in geospatial epidemiology literature.
Abstract
We present HySemRAG, a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG) to automate large-scale literature synthesis and identify methodological research gaps. The system addresses limitations in existing RAG architectures through a multi-layered approach: hybrid retrieval combining semantic search, keyword filtering, and knowledge graph traversal; an agentic self-correction framework with iterative quality assurance; and post-hoc citation verification ensuring complete traceability. Our implementation processes scholarly literature through eight integrated stages: multi-source metadata acquisition, asynchronous PDF retrieval, custom document layout analysis using modified Docling architecture, bibliographic management, LLM-based field extraction, topic modeling, semantic unification, and knowledge graph construction. The system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
