# SKiM-GPT: combining biomedical literature-based discovery with large language model hypothesis evaluation

**Authors:** Jack Freeman, Robert J. Millikin, Leo Xu, Ishaan Sharma, Bethany Moore, Cannon Lock, Kevin Shine George, Aviral Bal, Chitrasen Mohanty, Ron Stewart

PMC · DOI: 10.1186/s12859-025-06350-7 · 2025-12-17

## TL;DR

SKiM-GPT combines literature-based discovery with large language models to efficiently evaluate biomedical hypotheses using retrieved evidence.

## Contribution

Introduces SKiM-GPT, a transparent RAG system that evaluates hypotheses using SKiM co-occurrence and LLMs with human-verifiable justifications.

## Key findings

- SKiM-GPT achieves strong agreement with expert biologists on a benchmark of disease-gene-drug hypotheses (Cohen’s κ = 0.84).
- The system retrieves relevant abstracts, filters them, and provides hypothesis scores with natural language justifications.

## Abstract

Generating and testing hypotheses is a critical aspect of biomedical science. Typically, researchers generate hypotheses by carefully analyzing available information and making logical connections, which are then tested. The accelerating growth of biomedical literature makes it increasingly difficult to keep pace with connections between biological entities emerging across biomedical research. Recently developed automated means of generating hypotheses can generate many more hypotheses than can be easily tested. One such approach involves literature‑based discovery (LBD) systems such as Serial KinderMiner (SKiM), which surfaces putative A‑B‑C links derived from term co‑occurrence. However, LBD systems leave three critical gaps: (i) they find statistical associations, not biological relationships; (ii) they can produce false‑positive leads; and (iii) they do not assess agreement with a hypothesis in question. As a result, LBD search results often require costly manual curation to be of practical utility to the researcher. Large language models (LLMs) have the potential to automate much of this curation step, but standalone LLMs are hampered by hallucinations, lack of transparency in information sources, and the inability to reference data not included in the training corpus.

We introduce SKiM-GPT, a retrieval-augmented generation (RAG) system that combines SKiM’s co-occurrence search and retrieval with frontier LLMs to evaluate user-defined hypotheses. For every chosen A-B-C SKiM hit, SKiM-GPT retrieves appropriate PubMed abstract texts, filters out irrelevant abstracts with a fine-tuned relevance model, and prompts an LLM to evaluate the user’s hypothesis, given the relevant abstracts. Importantly, the SKiM-GPT system is transparent and human-verifiable: it displays the retrieved abstracts, the hypothesis score, and a justification for the score grounded in the texts and written in natural language. On a benchmark consisting of 14 disease-gene-drug hypotheses, SKiM-GPT achieves strong ordinal agreement with four expert biologists (Cohen’s κ = 0.84), demonstrating its ability to replicate expert judgment.

SKiM-GPT is open-source (https://github.com/stewart-lab/skimgpt) and available through a web interface (https://skim.morgridge.org), enabling both wet-lab and computational researchers to systematically and efficiently evaluate biomedical hypotheses at scale.

The online version contains supplementary material available at 10.1186/s12859-025-06350-7.

## Full-text entities

- **Genes:** AHR (aryl hydrocarbon receptor) [NCBI Gene 196] {aka FVH3, RP85, bHLHe76}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, CCK (cholecystokinin) [NCBI Gene 885], SLN (sarcolipin) [NCBI Gene 6588]
- **Diseases:** cancers (MESH:D009369), hallucinations (MESH:D006212), Raynaud's disease (MESH:D011928), pancreatic cancer (MESH:D010190), LLMs (MESH:D007806), Breast (MESH:D061325), Breast Cancer (MESH:D001943), node (MESH:D012804), diabetes (MESH:D003920), Alzheimer's disease (MESH:D000544), LBD (MESH:D019292)
- **Chemicals:** GPT-4 (-), gant61 (MESH:C551027), Erlotinib (MESH:D000069347), fish oil (MESH:D005395)
- **Species:** Liphistius sp. LM (species) [taxon 1285381], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** -GPT — Homo sapiens (Human), Chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_SQ48)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12829140/full.md

---
Source: https://tomesphere.com/paper/PMC12829140