# Can ChatGPT-5 educate the public about vasectomy?: a Google Trends–based expert panel assessment

**Authors:** Ali C. Albaz, Oğuzcan Erbatu, Okan Yiğit, Oktay Üçer, Gökhan Temeltaş, Talha Müezzinoğlu

PMC · DOI: 10.3389/fdgth.2026.1726517 · Frontiers in Digital Health · 2026-03-18

## TL;DR

This study evaluates how well ChatGPT-5 can provide accurate and suitable information about vasectomy based on public questions and expert opinions.

## Contribution

The study introduces an expert panel assessment of ChatGPT-5's suitability for public education on vasectomy using real-world data from Google Trends.

## Key findings

- ChatGPT-5's responses were rated as clear and appropriately framed for public use.
- Medical accuracy and completeness showed significant variability among expert ratings.
- Inter-rater reliability was very low, indicating inconsistent evaluations across experts.

## Abstract

ChatGPT-5, the latest multimodal large language model (LLM), has gained remarkable public attention for its ability to provide real-time and context-aware health information. However, its effectiveness in addressing sensitive urological topics such as vasectomy has not been systematically evaluated.

This study aimed to evaluate the accuracy, completeness and public suitability of ChatGPT-5's responses to frequently asked questions about vasectomy, derived from Google Trends data reflecting real-world public interest.

A total of eight experts—four urologists, two public health specialists, one obstetrician-gynecologist and one fertility nurse—independently assessed ChatGPT-5's responses to ten high-frequency vasectomy-related questions. Each response was rated using six 5-point Likert-scale criteria: medical accuracy, completeness, clarity, tone, public usefulness and recommendability. Descriptive statistics, Kruskal–Wallis tests and two-way random-effects intraclass correlation coefficients (ICC, 95% CI) were applied for statistical analysis.

The mean ratings across evaluation domains ranged from 3.75 to 4.04. Clarity of language and tone appropriateness received the highest scores, whereas medical accuracy and comprehensiveness demonstrated greater dispersion. No statistically significant differences were observed among expert subgroups (p > 0.05). Inter-rater reliability was very low (ICC = −0.01), indicating substantial variability across expert evaluations.

In this exploratory assessment, ChatGPT-5 responses to vasectomy-related public questions were frequently perceived as clear and appropriately framed for informational use. However, variability across expert ratings and the absence of layperson validation underscore the need for cautious interpretation. Large language model outputs may serve as supportive educational resources when accompanied by expert oversight and audience-specific adaptation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13040449/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13040449/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC13040449/full.md

---
Source: https://tomesphere.com/paper/PMC13040449