Literature Mining System for Nutraceutical Biosynthesis: From AI Framework to Biological Insight
Xinyang Sun, Nipon Sarmah, Miao Guo

TL;DR
This paper introduces an AI-powered literature mining system using large language models to identify microbial strains involved in nutraceutical biosynthesis, providing valuable biological insights and supporting synthetic biology applications.
Contribution
The study develops a domain-specific LLM-based system with advanced prompt engineering for extracting microbial-nutraceutical associations from scientific literature, outperforming existing models.
Findings
DeepSeekV3 achieves higher accuracy than LLaMA2 with domain-specific prompts.
Generated a validated dataset of 35 nutraceutical-strain associations.
Revealed microbial diversity and key strains in nutraceutical biosynthesis.
Abstract
The extraction of structured knowledge from scientific literature remains a major bottleneck in nutraceutical research, particularly when identifying microbial strains involved in compound biosynthesis. This study presents a domain-adapted system powered by large language models (LLMs) and guided by advanced prompt engineering techniques to automate the identification of nutraceutical-producing microbes from unstructured scientific text. By leveraging few-shot prompting and tailored query designs, the system demonstrates robust performance across multiple configurations, with DeepSeekV3 outperforming LLaMA2 in accuracy, especially when domain-specific strain information is included. A structured and validated dataset comprising 35 nutraceutical-strain associations was generated, spanning amino acids, fibers, phytochemicals, and vitamins. The results reveal significant microbial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Materials Science · Microbial Metabolic Engineering and Bioproduction
