Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System
Sonakshi Gupta, Akhlak Mahmood, Wei Xiong, Rampi Ramprasad

TL;DR
This paper develops two retrieval pipelines, VectorRAG and GraphRAG, to enhance literature-based knowledge extraction in polymers, enabling better retrieval, reasoning, and validation for scientific research.
Contribution
It introduces domain-specific retrieval pipelines combining semantic embeddings and knowledge graphs, tailored for polymer literature, improving accuracy and interpretability over generic models.
Findings
GraphRAG achieves higher precision and interpretability.
VectorRAG offers broader recall, complementing GraphRAG.
Expert validation confirms the relevance and reliability of the responses.
Abstract
Polymer literature contains a large and growing body of experimental knowledge, yet much of it is buried in unstructured text and inconsistent terminology, making systematic retrieval and reasoning difficult. Existing tools typically extract narrow, study-specific facts in isolation, failing to preserve the cross-study context required to answer broader scientific questions. Retrieval-augmented generation (RAG) offers a promising way to overcome this limitation by combining large language models (LLMs) with external retrieval, but its effectiveness depends strongly on how domain knowledge is represented. In this work, we develop two retrieval pipelines: a dense semantic vector-based approach (VectorRAG) and a graph-based approach (GraphRAG). Using over 1,000 polyhydroxyalkanoate (PHA) papers, we construct context-preserving paragraph embeddings and a canonicalized structured knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Biomedical Text Mining and Ontologies · Advanced Graph Neural Networks
