Knowledge-Guided Explainable Recommendation Tool for Cancer Risk Prediction Models Using Retrieval-Augmented Large Language Models: Development and Validation Study
Shumin Ren, Xin Zheng, Jing Zhao, Jiale Du, Yuxin Zhang, Cheng Bi, Jie Song, Jinyi Zhang, Hongmei Lang, Fan Zhang, Bairong Shen

TL;DR
CanRisk-RAG is a new system that helps find cancer risk prediction models more accurately and transparently than existing tools.
Contribution
Development of a retrieval-augmented, knowledge-guided system for recommending cancer risk prediction models using LLMs and structured metadata.
Findings
CanRisk-RAG outperformed baseline tools in relevance and reliability scores for cancer risk model queries.
The system provides structured, accurate recommendations based on validated evidence and multifactor ranking.
Experts rated CanRisk-RAG higher than PubMed, ChatGPT-4o, ScholarAI, and Gemini 1.5 Flash.
Abstract
Cancer risk prediction models are vital for precision prevention, enabling individualized assessment of cancer susceptibility based on genetic, clinical, environmental, and lifestyle factors. However, the practical use of these models is hindered by fragmented resources, heterogeneous reporting, and the absence of transparent, structured systems for systematic discovery and comparison. This study aimed to develop a retrieval-augmented, knowledge-guided system that provides accurate recommendations for cancer risk prediction models. We developed CanRisk-RAG, a recommendation platform underpinned by a precisely constructed knowledge base comprising more than 800 peer-reviewed cancer risk prediction models spanning diverse cancer types, modeling approaches, and predictive variables. The system integrates (1) large language model (LLM)–based semantic tag extraction, (2) embedding…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
