A Cross-Sectional Study Comparing Patient Information Guides for Amyotrophic Lateral Sclerosis, Myasthenia Gravis, and Guillain-Barré Syndrome Produced by ChatGPT-4 and Google Gemini 1.5
Daaniya Tariq, Ramya Madhusudan, Yashaswi Guntupalli, Shivaashish Karumanchi Anantha Venkata Sai, Bharath Vejandla, Mohit LNU

TL;DR
This study compares patient education guides for three neurological conditions created by two AI tools, finding differences in readability and content length.
Contribution
The study introduces a comparison of AI-generated patient guides for ALS, MG, and GBS using readability and reliability metrics.
Findings
ChatGPT produced guides with higher grade level and lower ease scores compared to Google Gemini.
Both AI tools showed similar reliability and content similarity percentages.
A significant difference was found in the mean word counts generated by the two AI tools.
Abstract
Introduction: Patient education for amyotrophic lateral sclerosis (ALS), myasthenia gravis (MG), and Guillain-Barré syndrome (GBS) is essential for effective symptom management, improving quality of life, and enabling informed care decisions. AI tools enhance healthcare and patient education through personalized care and improved diagnostics. Methods: In this study, ChatGPT (OpenAI, San Francisco, CA, USA) and Google Gemini (Mountain View, CA, USA) generated patient education guides for ALS, MG, and GBS. Variables included word count, sentence count, average words and syllables per sentence, grade level, ease score using the Flesch-Kincaid calculator, similarity score using QuillBot, and reliability using a modified DISCERN score. Statistical analysis was done using R version 4.3.2 (2023; R Foundation for Statistical Computing, Vienna, Austria). Results: ChatGPT-generated brochures…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1| Variables | ChatGPT | Google Gemini | P value | ||
| Mean | Standard Deviation | Mean | Standard Deviation | ||
| Words | 608.0 | 42.04 | 403.0 | 32.05 | 0.0033* |
| Sentences | 59.0 | 9.64 | 50.0 | 8.19 | 0.2869 |
| Average Words per Sentence | 10.57 | 2.25 | 8.17 | 0.76 | 0.1982 |
| Average Syllables per Word | 2.0 | 0.00 | 1.90 | 0.17 | 0.4226 |
| Grade Level | 12.13 | 0.86 | 10.0 | 2.10 | 0.2137 |
| Ease Score | 26.90 | 2.29 | 37.80 | 14.83 | 0.3303 |
| Similarity % | 35.20 | 4.31 | 33.57 | 18.10 | 0.8918 |
| Reliability Score | 4.0 | 1.00 | 3.67 | 1.15 | 0.7250 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Patient-Provider Communication in Healthcare · Myasthenia Gravis and Thymoma
Introduction
Neuro-muscular disorders such as amyotrophic lateral sclerosis (ALS), myasthenia gravis (MG), and Guillain-Barré syndrome (GBS) are severe conditions characterized by progressive muscle weakness and can lead to significant disability and mortality [1-3]. Effective patient education plays a crucial role in managing these disorders by aiding in the early identification of symptoms, proper medication management, and adherence to treatment plans, thereby reducing complications and improving patient outcomes. By empowering patients with knowledge about their condition, education becomes a pivotal tool in preventing early mortality and enhancing the quality of life [1,2].
In recent years, artificial intelligence (AI) tools like ChatGPT (OpenAI, San Francisco, CA, USA) and Google Gemini (Mountain View, CA, USA) have emerged as promising aids in patient education. These tools provide easily accessible, accurate information about a wide range of diseases, including complex neuromuscular disorders. ChatGPT, a conversational AI developed by OpenAI, is designed to simulate human-like dialogue, and can provide personalized educational content [4]. Google Gemini, another advanced AI tool, offers extensive health information, including up-to-date research findings and patient education materials [5]. While these AI tools enhance patient education, there are concerns regarding the accuracy of information, the lack of a personal connection with healthcare providers, and the potential for over-reliance on these technologies [4,5].
Combining the capabilities of ChatGPT and Google Gemini can significantly enhance patient counseling for neuromuscular disorders like ALS, MG, and GBS. These AI tools can provide patients with tailored, accessible information, complementing the guidance of healthcare professionals. However, careful consideration must be given to ensuring the accuracy of the information provided and integrating these tools into a holistic patient care approach [4-6].
Aims and objectives
This study aims to compare the readability, reliability, and content complexity of patient education materials generated by ChatGPT and Google Gemini for ALS, MG, and GBS.
Materials and methods
This original cross-sectional study was carried out over the course of a week, beginning from the 5th of August 2024 to the 12th of August 2024. Ethics committee approval was deemed exempt in view of no human participants.
Three prevalent neurological conditions (ALS, MG, and GBS) were chosen. Google Gemini (Gemini 1.5 Flash) and ChatGPT (GPT-4 version) were the two AI tools used for generation of brochures for patient education [5,7]. Prompts were given to both AI tools: “Write a patient education guide for Amyotrophic Lateral Sclerosis (ALS)”; “Write a patient education guide for Myasthenia Gravis”; “Write a patient education guide for “Guillain-Barré Syndrome” (Appendices 1-3).
The responses generated by the AI were scored using a variety of statistical metrics and saved in a Microsoft (Redmond, WA, USA) Word document. Flesch-Kincaid Calculator was used to calculate measures like word count, number of sentences, average number of words per sentence, average number of syllables per word, grade level and ease score [8]. Quillbot plagiarism checker was used to measure the similarity percentage and modified DISCERN score was used to calculate the reliability score [9,10].
The modified DISCERN score is a set of five questions derived from the original DISCERN instrument for assessing the reliability of health information. Each question on the modified DISCERN scale was assigned a score of 0 or 1. A total score of 5 denotes great dependability, whereas a 0 indicates low reliability in this rating system.
The data was exported into Microsoft Excel for further analysis. The statistical analysis was done using R (version 4.3.2, R Foundation for Statistical Computing, Vienna, Austria), a programming language extensively used for statistical computing [6]. The responses generated by the AI tools were compared using unpaired T-test for which the P-value of <0.05 was considered significant. The linear correlation of ease score and reliability score was measured using Pearson’s correlation coefficient which has a value ranging from -1 to 1.
Results
Table 1 compares the characteristics of responses generated by ChatGPT and Google Gemini across various metrics. ChatGPT responses are significantly longer (608 words on average) compared to Google Gemini (403 words), with a notable p-value of 0.0033. The responses also differ in average words per sentence and syllables per word, though these differences are not statistically significant. ChatGPT's responses tend to have a higher grade level (12.13) and lower ease score (26.90) compared to Google Gemini. Both models show similarities in response reliability and similarity percentage, with no statistically significant differences across these metrics.
Table 1: Characteristics of responses generated by ChatGPT and Google Gemini*t-test. P-values <0.05 in t-test are considered statistically significant.
There was no significant difference in the sentence count (p=0.2869), average word per sentence (p=0.1982), average syllables per word (p=0.4226), grade level (p=0.2137), ease score (p=0.3303), similarity% (p=0.8918) and reliability score (p=0.7250) between ChatGPT and Google Gemini. However, the word count was significantly more for ChatGPT-generated responses compared to Google Gemini (p=0.0033).
Figure 1 shows the grade level, ease score, similarity percent, and reliability score of the patient education guides made by ChatGPT and Google Gemini for the diseases ALS, MG, and GBS.
Graphical representation of comparison between grade level, ease score, similarity percent, and reliability score for the patient education guides generated by ChatGPT and Google Gemini
Based on Figure 1, the grade level for ChatGPT (ALS: 11.2, MG: 12.9, GBS: 12.3; mean = 11.9) was higher than Google Gemini (ALS: 10.9, MG: 11.5, GBS: 7.6; mean = 10). The ease score for Google Gemini (ALS: 30, MG: 28.5, GBS: 54.9; mean = 37.8) was higher than ChatGPT (ALS: 29.4, MG: 24.9, GBS: 26.4; mean = 26.9). The similarity percent for ChatGPT (ALS: 40.1, MG: 32, GBS: 33.5; mean = 35.2) was higher than Google Gemini (ALS: 45, MG: 43, GBS: 12.7; mean = 33.5). The reliability score was similar but slightly higher for ChatGPT (ALS: 5, MG: 4, GBS: 3; mean = 4) than Google Gemini (ALS: 5, MG: 3, GBS: 3; mean = 3.6). ChatGPT was slightly more complex than Google Gemini, but it was inferred to be more reliable.
Discussion
The responses produced by two AI systems, ChatGPT and Google Gemini, for patient education brochures for ALS, MG, and GBS were compared in a cross-sectional research. The analysis revealed that a statistically significant difference between ChatGPT and Google Gemini was in the number of words generated (p=0.0033). ChatGPT generates longer and slightly more complex text, while Google Gemini’s text may be easier to read. The tools were relatively similar in terms of reliability and the other assessed variables.
AI plays a pivotal role in transforming patient education by delivering customized and accessible information, aiding patients in making informed health decisions [11]. With AI’s rapid advancements and increasing integration in healthcare and educational contexts, the demand for succinct and comprehensible educational materials has grown [12]. In this analysis, ChatGPT generated an average of 608 words per text, and Google Gemini produced 403 words per text, a notable difference. Additionally, the average words per sentence were 10.57 for ChatGPT and 8.17 for Google Gemini. Although the National Institutes of Health (NIH) and the National Academy of Medicine (NAM) recommend that patient educational materials be written at or below a sixth-grade reading level, the Flesch-Kincaid scores indicated that ChatGPT’s content was at a 12th-grade level, while Google Gemini’s was at a 10th-grade level [8,13]. This reveals that both AI tools produced content that exceeds the readability of high school levels, with ChatGPT being particularly complex and Google Gemini requiring at least a 10th-grade reading level. A cross-sectional study by Gibson et al. further corroborates this finding, noting that AI-generated content for prostate cancer education was “difficult to read,” with a Flesch reading ease score of 45.97 and a Flesch-Kincaid grade level of 12.12 [14].
AI tools, including ChatGPT and Google Gemini, are extensively trained on pre-existing literature, and may sometimes generate content resembling previously published research, raising concerns about potential plagiarism. Additionally, these tools lack the ability to fully grasp the context of the literature, resulting in suggestions that are not always relevant, or sometimes biased [15]. When integrating AI into medical practice and research, it is crucial to avoid plagiarism, as it undermines the academic integrity of medical professionals and their credentials. The QuillBot plagiarism tool [9], used to assess the similarity percent, showed that ChatGPT had an average similarity score of 35.2% and Google Gemini had an average similarity score of 33.57%. The results highlight the increasing focus on ChatGPT’s applications in healthcare, underscoring the need for further investigation into its effectiveness and the associated ethical considerations [16].
The modified DISCERN score is a tool used to evaluate the reliability of health web pages based on specific criteria; it was utilized in the analysis to assess the reliability of ChatGPT and Google Gemini [17]. The average DISCERN score for ChatGPT was 4.0, while Google Gemini was 3.67. A cross-sectional study by Golan et al. revealed significant discrepancies between ChatGPT’s quality assessments and those of human evaluators, as well as between ChatGPT and established tools like Readable.com, indicating potential misalignment in evaluating the quality and readability of medical content [18].
Limitations
The study was limited to two AI tools, ChatGPT and Google Gemini, suggesting a need for broader assessment. The focus on a limited number of diseases calls for further research into other diseases. The version of ChatGPT used is outdated, potentially affecting the accuracy of its content. The rapid evolution of medical science raises concerns about the ability of these tools to provide the most up-to-date information.
Conclusions
This study shows that, for patient information brochures on ALS, MG, and GBS, there is no discernible difference in the average ease score, grade score, and reliability score of responses produced by the two AI tools. The two AI tools' "words" differ from one another in a statistically significant way.
Further research should be conducted to investigate more AI technologies in diverse and contemporary disorders. Furthermore, it should be determined whether these technologies can create material in accordance with the latest updated guidelines. Tools should be upgraded to offer up-to-date and reliable information that can also be utilized by the general population.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Diagnosis of multiple sclerosis: progress and challenges Lancet Brownlee WJ Hardy TA Fazekas F Miller DH 1336134638920172788919010.1016/S 0140-6736(16)30959-X · doi ↗ · pubmed ↗
- 2Myasthenia gravis: a review Autoimmune Dis Jayam Trouth A Dabi A Solieman N Kurukumbi M Kalyanam J 87468020122012 https://doi.org/10.1155/2012/874680.2319344310.1155/2012/874680 PMC 3501798 · doi ↗ · pubmed ↗
- 3Epidemiology of amyotrophic lateral sclerosis: a review of literature Rev Neurol (Paris) Couratier P Corcia P Lautrette G Nicol M Preux PM Marin B 37451722016 https://doi.org/10.1016/j.neurol.2015.11.002.2672730710.1016/j.neurol.2015.11.002 · doi ↗ · pubmed ↗
- 4Chat GPT: Optimizing language models for dialogue Auto GPT Official 8 2024 Pogla M 2024 https://autogpt.net/chatgpt-optimizing-language-models-for-dialogue/
- 5Introducing Gemini: our largest and most capable AI model 8 2024 Pichai S. Introducing Gemini: our largest and most capable AI model. Google 2023. https://blog.google/technology/ai/google-gemini-ai/ (accessed August 31 2024).2024). 2023 https://blog.google/technology/ai/google-gemini-ai/
- 6Burden of neurological disorders across the US from 1990-2017: a global burden of disease study JAMA Neurol Feigin VL Vos T Alahdab F 1651767820213313613710.1001/jamaneurol.2020.4152 PMC 7607495 · doi ↗ · pubmed ↗
- 7Introducing Chat GPT 8 2024 2024 https://openai.com/index/chatgpt/
- 8Flesch Kincaid Calculator 8 2024 2024 https://goodcalculators.com/flesch-kincaid-calculator/
