Quality of Conventional versus Artificial Intelligence Oral Surgery Consent Forms: Comparative Analysis
Jan Gaessler, Bernhard Remschmidt, Ann-Kathrin Jopp, Behrouz Arefnia, Adrian Franke, Marcus Rieder

TL;DR
AI-generated consent forms for oral surgery are better than traditional ones, but both need improvement for full comprehension.
Contribution
This study compares AI-generated and conventional consent forms for oral surgery, revealing differences in quality and readability.
Findings
AI-generated consent forms showed higher quality and better readability than conventional forms.
Both AI and conventional forms failed to meet recommended comprehension levels for patients.
Abstract
Artificial intelligence–generated informed consent forms for oral surgery demonstrated higher quality and better readability than conventional web-based forms, though both fell short of recommended comprehension levels.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
| Quality | Median (IQR) | |
|---|---|---|
| Overall quality | .007 | |
| Conventional (i.e., web-based) ICFs | 27.50 (20.125-37) | |
| Artificial intelligence-generated ICFs | 32.50 (28-36.25) | |
| All combined | 31.00 (23-37) | |
| Differences by procedure | .004 | |
| Apicoectomy | 27.00 (21.75-34.875) | |
| Biopsy | 30.50 (25.75-33) | |
| Oral bone augmentation | 31.50 (25.75-37.5) | |
| Dental cystectomy | 31.25 (23-33.875) | |
| Dental implants | 33.25 (20.625-37.125) | |
| Oral incision and drainage | 31.50 (23.5-39.5) | |
| Dental local anesthesia | 28.50 (21-34.5) | |
| Periodontal surgery | 36.50 (32.5-42) | |
| Tooth extraction | 23.50 (20-32.75) | |
| Wisdom tooth removal | 28.25 (20-36.875) | |
| Differences by large language model | <.001 | |
| ChatGPT | 34.25 (33-37) | |
| Claude | 40.50 (35-43) | |
| Bing Chat | 30.00 (27.25-31.75) | |
| Google Bard | 26.50 (22.75-31.375) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Medical Malpractice and Liability Issues · Dental Radiography and Imaging
Introduction
Informed consent is a foundational element of ethical and legal medical care, ensuring patients understand the nature, risks, and alternatives of proposed treatments [12]. In oral surgery, where procedures can be complex and invasive, clear and high-quality informed consent forms (ICFs) are especially critical. However, many ICFs exceed the recommended 6th-grade reading level, limiting patient comprehension [3]. With the recent rise of artificial intelligence (AI), particularly large language models (LLMs), there is growing interest in their potential to improve patient communication [45]. This study aimed to assess the quality and readability of conventional, web-based oral surgery ICFs and compare them to those generated by AI-based LLMs.
Methods
Ten common oral surgery procedures were selected (ie, apicoectomy, biopsy, bone augmentation, cystectomy, dental implants, incision and drainage, local anesthesia, periodontal surgery, tooth extraction, and wisdom tooth removal). Using Google Chrome in incognito mode, 300 web-based ICFs (ie, 30 per procedure) were collected (see search strategy in Multimedia Appendix 1). In parallel, four LLMs (ChatGPT 3.5, Claude, Bard, and Bing Chat) were prompted to generate ICFs for the same procedures using standardized requests. Per every procedure and LLM, two basic and non-directive prompts were developed to minimize bias and ensure neutrality, resulting in 80 AI-generated ICFs (see Multimedia Appendix 1). Subsequently, two oral and maxillofacial surgeons screened the collected forms using predefined inclusion and exclusion criteria (see Multimedia Appendix 1).
Quality was assessed using a newly developed alteration of the well-established DISCERN instrument [6], namely the Graz Assessment Tool for Written Informed Consent Keypoints (GATWICK; see Multimedia Appendix 1). It was validated through expert review for content relevance and consistency. It includes 11 items scored on a 5-point Likert scale (total score range 11‐55). Two oral and maxillofacial surgery residents independently rated all forms. Readability was evaluated using six established formulas (ie, Automated Readability Index, Coleman-Liau, Flesch-Kincaid, FORCAST, Gunning Fog, and Simple Measure of Gobbledygook), and an average reading grade level was calculated [7]. Statistical analyses included the Mann-Whitney U test, Kruskal-Wallis test, and Kendall tau-b, with significance set at P≤.05.
Results
Of 380 screened documents, 213 ICFs met the inclusion criteria: 136 web-based and 77 AI-generated ones. The inter-rater reliability for GATWICK scores was excellent (intraclass correlation coefficient=0.948).
Regarding the quality, AI-generated ICFs had significantly higher total GATWICK scores compared to web-based ones (median 32.5, IQR 28-35.5 vs median 27.5, IQR 20.375-37; P=.007). Items related to treatment alternatives, rationale for recommended intervention, and discussion of options scored particularly higher in AI-generated forms. Web-based ICFs scored better in perioperative behavior instructions.
Considering the readability, web-based forms were significantly harder to read (median grade level 12.45, IQR 11.3-13.325) than AI-generated forms (median 10.7, IQR 10.1-12.4; P<.001), although neither met the recommended 6th-grade level. Readability was weakly correlated with overall quality (τ=0.132; P=.005).
The word count was higher for web-based forms (median 794 words, IQR 475.25-1068.75 words) than AI-generated ones (median 338 words, IQR 296-381 words; P<.001). Longer forms showed a weak correlation with higher quality (τ=0.270; P<.001).
Among LLMs, ChatGPT-powered services (ie, ChatGPT 3.5 and Claude) scored significantly higher in terms of quality. ICFs on tooth extraction scored significantly worse when compared with periodontal surgery forms. AI-generated informed consent forms performed significantly better than conventional versions, with notable differences across oral surgical procedures and among the types of LLMs used (Table 1).
Discussion
Principal Findings
This study found that conventional oral surgery ICFs available online are generally of modest quality and exceed recommended reading levels. AI-generated ICFs outperformed web-based ones in both quality and readability, although they too fell short of ideal readability standards.
These findings are consistent with prior research across medical disciplines, which show that most ICFs are written at a level too advanced for the average patient [89]. Notably, AI-generated forms more consistently addressed key informed consent components such as treatment alternatives and rationale, suggesting that LLMs may serve as valuable tools in drafting patient-centered documents. However, AI models may also produce inaccuracies or omit procedure-specific nuances, highlighting the need for expert review [10].
The limitations of this study include its focus on English-language materials and the variability inherent in AI outputs depending on prompt phrasing or model version. While the GATWICK tool demonstrated strong reliability, further validation is needed.
Conclusion
AI-based LLMs offer a promising avenue for improving the quality and accessibility of oral surgery informed consent documents. Future efforts should focus on refining AI outputs and integrating clinician oversight to ensure accuracy, comprehensiveness, and patient comprehension.
Supplementary material
10.2196/59851Multimedia Appendix 1Methodology showing the utilization of the Boolean operator “OR” helped to broaden the web search as it accounted for differences regarding designation and spelling (i.e., American versus British English). Detailed description of the Graz Assessment Tool of Written Informed Consent Keypoints (GATWICK).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Agozzino E Borrelli S Cancellieri M Carfora FM Di Lorenzo T Attena F Does written informed consent adequately inform surgical patients? A cross sectional study BMC Med Ethics 01720192011 doi 10.1186/s 12910-018-0340-z Medline 30616673 PMC 6323683 · doi ↗ · pubmed ↗
- 2General Medical Council Consent: patients and doctors making decisions together UR Lhttps://www.gmc-uk.org/-/media/documents/gmc-guidance-for-doctors---consent---english_pdf-48903482.pdf Accessed 23-04-2024
- 3Powers BJ Trinh JV Bosworth HB Can this patient read and understand written health information?JAMA 077201030417684 doi 10.1001/jama.2010.896Medline 20606152 · doi ↗ · pubmed ↗
- 4Rasteau S Ernenwein D Savoldelli C Bouletreau P Artificial intelligence for oral and maxillo-facial surgery: A narrative review J Stomatol Oral Maxillofac Surg Jun 20221233276282 doi 10.1016/j.jormas.2022.01.010Medline 35091121 · doi ↗ · pubmed ↗
- 5Puladi B Gsaxner C Kleesiek J Hölzle F Röhrig R Egger J The impact and opportunities of large language models like Chat GPT in oral and maxillofacial surgery: a narrative review Int J Oral Maxillofac Surg 0120245317888 doi 10.1016/j.ijom.2023.09.005Medline 37798200 · doi ↗ · pubmed ↗
- 6Charnock D Shepperd S Needham G Gann R DISCERN: an instrument for judging the quality of written consumer health information on treatment choices J Epidemiol Community Health Feb 1999532105111 doi 10.1136/jech.53.2.105Medline 10396471 PMC 1756830 · doi ↗ · pubmed ↗
- 7Ley P Florio T The use of readability formulas in health care Psychol Health Med Feb 199611728 doi 10.1080/13548509608400003 · doi ↗
- 8Meade MJ Dreyer CW Orthodontic treatment consent forms: a readability analysis J Orthod Mar 20224913238 doi 10.1177/14653125211033301 Medline 34325567 · doi ↗ · pubmed ↗
