# Real-World Impact and Educational Effectiveness of an AI-Powered Medical History-Taking System: Retrospective Propensity Score-Matched Cohort Study

**Authors:** Yang Liu, Yiying Zhu, Weishan Zhang, Xian Lu, Liping Wu, Minghui Yue, Oudong Xia, Chujun Shi

PMC · DOI: 10.2196/89367 · 2026-02-24

## TL;DR

An AI-powered system for medical history-taking training improved student performance in real-world settings, with benefits varying based on students' prior academic ability.

## Contribution

This study provides real-world evidence that voluntary use of an AI-based training system improves medical history-taking skills in students.

## Key findings

- AMTES users outperformed nonusers by 3% in final examination scores.
- High-intensity practice did not lead to significantly higher scores or excellence probability.
- Higher baseline academic ability was associated with greater benefits from AMTES use.

## Abstract

Medical history-taking is a core clinical skill; yet, traditional teaching methods face challenges. We developed an artificial intelligence–powered medical history-taking training and evaluation system (AMTES) and established its technical feasibility as an extracurricular resource. Evidence on whether such tools improve learning outcomes when voluntarily embedded in routine curricula remains limited.

This study aimed to evaluate the real-world educational effectiveness of AMTES as an opt-in extracurricular tool and examine whether learning gains vary by practice patterns and baseline academic ability.

We conducted a retrospective cohort study of the 2024-2025 Diagnostics course cohort (N=478) at Shantou University Medical College, China, using total population sampling. Students were categorized as AMTES users (n=205, 42.9%; ≥1 sessions) and nonusers (n=273, 57.1%) based on their voluntary extracurricular adoption of the system during the month preceding a high-stakes final practical skills examination. To address selection bias, we performed 1:1 propensity score matching via logistic regression using age, sex, and 3 previous academic scores as covariates. The average treatment effect on the treated for final examination score (0-70) was estimated with paired t tests, and robustness to unobserved confounding was assessed via Rosenbaum sensitivity analysis. Among matched users, practice patterns were identified using K-means clustering on log-derived features, with cluster differences compared using Mann-Whitney U tests. Subsequently, we explored aptitude-treatment interaction by testing the interaction between practice intensity and baseline ability using linear and logistic regression models.

Propensity score matching yielded 157 matched pairs (n=314) with excellent covariate balance (|standardized mean difference|<0.1). In the matched cohort, the users outperformed nonusers by 3% (average treatment effect on the treated=2.09, 95% CI 0.75-3.42; P=.002). This finding was robust to weak unmeasured confounding (Rosenbaum Γ=1.23). Among users (N=157), cluster analysis of usage logs revealed a low-intensity group (74/157, 47.1%) and a high-intensity group (83/157, 52.9%). The 2 groups reflected differences in both practice quantity and quality. However, the added efforts did not translate into higher scores (mean difference=1.6 points, 95% CI −0.5 to 3.6) or excellence probability (risk difference=7.7 percentage points, 95% CI −5.0 to 20.5). Exploratory aptitude-treatment interaction analyses suggested ability-dependent effects for excellence rate (β3=1.461; P=.04) and marginally for final score (β3=2.58; P=.07), but not for pass rate (P=.94).

Building upon previous technical validation, this study contributes real-world effectiveness evidence by evaluating AMTES as a voluntary extracurricular supplement within an authentic, high-baseline curriculum. Unlike previous work focusing on technical feasibility or short-term controlled trials, voluntary extracurricular AMTES use was associated with modest yet meaningful improvements in summative history-taking performance. Exploratory analyses indicated that the added value of more intensive engagement may be moderated by baseline academic ability. These findings support the scalability of artificial intelligence–enabled supplementary training and inform precision-oriented instructional design.

## Full-text entities

- **Genes:** SRL (sarcalumenin) [NCBI Gene 6345] {aka SAR}
- **Diseases:** hallucinations (MESH:D006212), cough (MESH:D003371), abdominal pain (MESH:D015746), AMTES (MESH:D000095027), frequency (MESH:D006316)
- **Chemicals:** AMTES (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12976603/full.md

---
Source: https://tomesphere.com/paper/PMC12976603