# Evaluating the Clinical Decision-Making Accuracy of Artificial Intelligence in Common Geriatric Syndromes Using Evidence-Based Guidelines

**Authors:** Peter Cassar, Francesca Galea, Peter Ferry

PMC · DOI: 10.7759/cureus.101858 · Cureus · 2026-01-19

## TL;DR

This study evaluates how well ChatGPT can help with geriatric care decisions using standard medical scenarios and finds it has strengths but also notable limitations.

## Contribution

The study is one of the first to assess AI clinical decision-making in geriatrics using expert ratings and standardized vignettes.

## Key findings

- ChatGPT scored highest in clarity and safety but lower in accuracy and completeness.
- Advance care planning had the highest scores, while urinary incontinence had the lowest.
- Key omissions in responses included missing assessments and guideline-recommended tools.

## Abstract

Background

Artificial intelligence (AI) tools such as ChatGPT are increasingly being explored for clinical decision support, yet their role in geriatric medicine remains uncertain due to the complexity of multimorbidity and care planning. This study aimed to evaluate the clinical accuracy, completeness, and guideline alignment of ChatGPT’s responses to common geriatric scenarios using standardized vignettes.

Methodology

Seven standardized vignettes representing common geriatric scenarios, namely, polypharmacy, falls, dementia, delirium, frailty, advance care planning, and urinary incontinence, were submitted to ChatGPT (GPT-5). Responses were evaluated by five independent consultant geriatricians using a standardized rubric across the following five domains: accuracy, completeness, guideline alignment, safety, and clarity (0-2 score per domain). Descriptive statistics summarized performance, and qualitative feedback was thematically analyzed. Inter-rater reliability was assessed using Krippendorff’s alpha.

Results

ChatGPT scored the highest in clarity (66/70) and safety (63/70), with slightly lower performance in accuracy (59/70) and completeness (55/70). Guideline alignment was generally strong (61/70). Advance care planning received the highest domain scores; urinary incontinence scored the lowest. Krippendorff’s alpha showed high inter-rater agreement (0.969). Reviewers identified key omissions, such as missing assessments or guideline-recommended tools, in multiple vignettes.

Conclusions

ChatGPT showed potential as a supportive tool in geriatric care, offering clear and generally safe responses aligned with guidelines. However, it lacked clinical depth and missed key elements in complex scenarios. AI tools such as ChatGPT should be used with caution, under expert oversight, and not as standalone decision makers in clinical practice.

## Linked entities

- **Diseases:** dementia (MONDO:0001627), delirium (MONDO:0045057)

## Full-text entities

- **Diseases:** Frailty (MESH:D000073496), Urinary Incontinence (MESH:D014549), osteoarthritis (MESH:D010003), stress incontinence (MESH:D014550), Delirium (MESH:D003693), Dementia (MESH:D003704), confusion (MESH:D003221), type 2 diabetes (MESH:D003924), cataracts (MESH:D002386), LLMs (MESH:D007806), falls (MESH:C537863), urinary tract infection (MESH:D014552), dizziness (MESH:D004244), loss of consciousness (MESH:D014474), insomnia (MESH:D007319), end-stage heart failure (MESH:D007676), cancer (MESH:D009369), lumbosacral radicular pain (MESH:D010146), hypertension (MESH:D006973), Geriatric Syndromes (MESH:D013577), urinary leakage (MESH:D003763), melanoma (MESH:D008545), AI hallucination (MESH:D006212)
- **Chemicals:** amlodipine (MESH:D017311), lisinopril (MESH:D017706), pseudoephedrine (MESH:D054199), metformin (MESH:D008687), zolpidem (MESH:D000077334), furosemide (MESH:D005665), duloxetine (MESH:D000068736), omeprazole (MESH:D009853), paracetamol (MESH:D000082), Donepezil (MESH:D000077265), tamsulosin (MESH:D000077409), ibuprofen (MESH:D007052)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12916024/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12916024/full.md

---
Source: https://tomesphere.com/paper/PMC12916024