# AI-powered classification and network analysis for knowledge mapping in medicine: a century of neurosyphilis research

**Authors:** Justine Falciola, Myriam Lamrayah, François R. Herrmann, Alexandre Wenger, Laurence Toutous Trellu

PMC · DOI: 10.1186/s12874-025-02750-8 · BMC Medical Research Methodology · 2025-12-26

## TL;DR

This paper uses AI and network analysis to map the history of neurosyphilis research, showing how knowledge evolved with medical and technological advances.

## Contribution

A novel framework combining LLMs, network analysis, and ITSA to automate and visualize biomedical literature trends.

## Key findings

- LLM-based classification achieved high repeatability (99.67% agreement) in categorizing neurosyphilis literature.
- Network analysis showed a shift from discipline-specific to interdisciplinary research structures over time.
- Publication trends increased significantly after milestones like penicillin G, HIV emergence, and genome sequencing.

## Abstract

Tracking the evolution of scientific knowledge is challenging due to the scale and complexity of the biomedical literature. Neurosyphilis is a clinically complex and historically stigmatized condition that remains difficult to diagnose and manage. Its underexplored literature offers an ideal test case to evaluate digital methods for mapping research trends and identifying knowledge gaps. We aim to assess how large language models (LLMs), network analysis, and interrupted time series analysis (ITSA) can be combined to automate literature classification and examine how knowledge of neurosyphilis has evolved.

We systematically searched Web of Science, Embase, PubMed Central, the Cochrane Library, and Lens for records on neurosyphilis published until December 31, 2024. We included records with available titles and abstracts in which GPT-4o mini was identified as being focused primarily on syphilis or neurosyphilis. Eligible records were classified into 23 research fields via LLM-based prompts. Network analysis visualized changes in research structures over time, and the ITSA assessed associations between publication trends and major clinical or technological milestones.

Among the 14 934 retrieved records, 4 646 met the inclusion criteria. LLM-based classification showed high repeatability (agreement = 99·67%, 95% CI 99·47–99·80; Cohen’s κ = 0·99, 95% CI 0·96–1·00). Biomedical, Clinical, and Health sciences were the most common domains. Network analysis revealed a shift from dense, discipline-specific clusters to larger interdisciplinary structures. ITSA revealed significant increases in publication activity following the introduction of penicillin G, HIV emergence, genome sequencing of Treponema pallidum, and the rise of digital dissemination platforms.

Combining LLMs with bibliometric and network methods provides a scalable framework for analyzing large-scale biomedical literature. When applied to neurosyphilis, the approach revealed links between research activity and clinical and technological advances. In addition to this case study, the method could support meta-research and inform evidence-based decision-making across other complex medical conditions.

The online version contains supplementary material available at 10.1186/s12874-025-02750-8.

## Linked entities

- **Chemicals:** penicillin G (PubChem CID 5904)
- **Diseases:** neurosyphilis (MONDO:0004944)
- **Species:** Treponema pallidum (taxon 160)

## Full-text entities

- **Diseases:** neurosyphilis (MESH:D009494)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12849313/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12849313/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12849313/full.md

---
Source: https://tomesphere.com/paper/PMC12849313