Evaluating Large Language Models for Mild Cognitive Impairment: A Bilingual Comparison of ChatGPT, Gemini, and Kimi

Yexuan Xiao; Qianhui Pan; Nan Jiang; Haoyuan Liu; Yilin He; Yuhe Zhang; Tingmei Wang

PMC · DOI:10.1093/geroni/igaf122.2247·December 31, 2025

Evaluating Large Language Models for Mild Cognitive Impairment: A Bilingual Comparison of ChatGPT, Gemini, and Kimi

Yexuan Xiao, Qianhui Pan, Nan Jiang, Haoyuan Liu, Yilin He, Yuhe Zhang, Tingmei Wang

PDF

Open Access

TL;DR

This study compares how well ChatGPT, Gemini, and Kimi handle questions about mild cognitive impairment in English and Chinese, finding that English responses are more accurate and clear.

Contribution

The study introduces a bilingual evaluation of LLMs for MCI management, highlighting language-specific performance differences and user-specific needs.

Findings

01

LLMs performed best in the Symptoms and Diagnosis domain.

02

Healthcare professionals received more accurate and actionable responses than care partners.

03

English responses were more comprehensible and specific than Chinese ones.

Abstract

Mild Cognitive Impairment (MCI) is a key stage between normal aging and Alzheimer’s Disease (AD), with early intervention crucial for slowing progression. Large Language Models (LLMs) offer promising support by providing accessible, evidence-based information for non-specialist healthcare professionals and care partners. However, concerns about accuracy and limited multilingual evaluations remain. This study explores the potential of LLMs in managing MCI, examines their support for non-specialist healthcare professionals and care partners, and compares English and Chinese responses to MCI-related queries, considering language-specific nuances and effectiveness. We submitted 72 open-ended questions related to MCI management to ChatGPT-4o, Gemini, and Kimi, assessing their responses based on accuracy, comprehensibility, specificity, and actionability using a five-point Likert scale.…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases1

Alzheimer’s Disease

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Dementia and Cognitive Impairment Research · Machine Learning in Healthcare