Large Language Model and Knowledge Graph-Driven AJCC Staging of Prostate Cancer Using Pathology Reports
Eunbeen Jo, Tae Il Noh, Hyung Joon Joo

TL;DR
This study shows how combining large language models and knowledge graphs can automatically and accurately stage prostate cancer using pathology reports.
Contribution
A novel system for automated AJCC staging using LLM-based information extraction and knowledge graph validation is developed and validated.
Findings
The system achieved high accuracy (0.973) and F1-score (0.986) in information extraction on the internal dataset.
AJCC staging classification had macro-averaged F1-scores of 0.930 and 0.833 for internal and external datasets, respectively.
Knowledge graph validation identified inconsistencies in 3.3% of cases.
Abstract
Background/Objectives: To develop an automated American Joint Committee on Cancer (AJCC) staging system for radical prostatectomy pathology reports using large language model-based information extraction and knowledge graph validation. Methods: Pathology reports from 152 radical prostatectomy patients were used. Five additional parameters (Prostate-specific antigen (PSA) level, metastasis stage (M-stage), extraprostatic extension, seminal vesicle invasion, and perineural invasion) were extracted using GPT-4.1 with zero-shot prompting. A knowledge graph was constructed to model pathological relationships and implement rule-based AJCC staging with consistency validation. Information extraction performance was evaluated using a local open-source large language model (LLM) (Mistral-Small-3.2-24B-Instruct) across 16 parameters. The LLM-extracted information was integrated into the knowledge…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
