Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval
Olga Bunkova, Lorenzo Di Fruscia, Sophia Rupprecht, Artur M. Schweidtmann, Marcel J.T. Reinders, Jana M. Weber

TL;DR
This paper explores how large language models can be effectively used with reaction knowledge graphs for chemical synthesis retrieval, emphasizing prompt strategies and self-correction to improve accuracy and validity.
Contribution
It introduces a framework for reaction path retrieval using Text2Cypher with LLMs, comparing prompting methods and self-correction, and provides a reproducible evaluation setup.
Findings
One-shot prompting with aligned exemplars performs best.
Self-correction improves executability mainly in zero-shot settings.
The evaluation framework and code are publicly available.
Abstract
Large Language Models (LLMs) can aid synthesis planning in chemistry, but standard prompting methods often yield hallucinated or outdated suggestions. We study LLM interactions with a reaction knowledge graph by casting reaction path retrieval as a Text2Cypher (natural language to graph query) generation problem, and define single- and multi-step retrieval tasks. We compare zero-shot prompting to one-shot variants using static, random, and embedding-based exemplar selection, and assess a checklist-driven validator/corrector loop. To evaluate our framework, we consider query validity and retrieval accuracy. We find that one-shot prompting with aligned exemplars consistently performs best. Our checklist-style self-correction loop mainly improves executability in zero-shot settings and offers limited additional retrieval gains once a good exemplar is present. We provide a reproducible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Scientific Computing and Data Management
