A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science
Zonglin Yang, Runze Mao, Tianhao Wu, Han Li, QingGuo Zhou, and Zhi X. Chen

TL;DR
This paper introduces a comprehensive framework for developing and evaluating domain-specific Large Language Models in combustion science, including a knowledge base, evaluation benchmark, and knowledge-injection methods.
Contribution
It presents the first end-to-end domain-specific LLM framework with a multimodal knowledge base, evaluation benchmark, and a multi-stage knowledge-injection pathway.
Findings
Standard RAG accuracy peaks at 60%.
Context contamination limits RAG performance.
Continued pretraining improves model knowledge.
Abstract
To advance foundation Large Language Models (LLMs) for combustion science, this study presents the first end-to-end framework for developing domain-specialized models for the combustion community. The framework comprises an AI-ready multimodal knowledge base at the 3.5 billion-token scale, extracted from over 200,000 peer-reviewed articles, 8,000 theses and dissertations, and approximately 400,000 lines of combustion CFD code; a rigorous and largely automated evaluation benchmark (CombustionQA, 436 questions across eight subfields); and a three-stage knowledge-injection pathway that progresses from lightweight retrieval-augmented generation (RAG) to knowledge-graph-enhanced retrieval and continued pretraining. We first quantitatively validate Stage 1 (naive RAG) and find a hard ceiling: standard RAG accuracy peaks at 60%, far surpassing zero-shot performance (23%) yet well below the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Catalysis and Oxidation Reactions · Advanced Combustion Engine Technologies
