# Validity and reliability of ChatGPT's responses on dietary supplements in Japan: A quality assessment and content analysis

**Authors:** Mingxin Liu, Tsuyoshi Okuhara, Ritsuko Shirabe, Yuriko Nishiie, Xinyi Chang, Hiroko Okada, Takahiro Kiuchi

PMC · DOI: 10.1016/j.pecinn.2026.100461 · PEC Innovation · 2026-02-21

## TL;DR

This study found that ChatGPT's responses about dietary supplements are often ambiguous and have a moderate risk of misinformation, suggesting the need for improved accuracy in AI health advice.

## Contribution

This is the first study to assess the validity and reliability of LLM responses on dietary supplements using both quantitative and qualitative methods.

## Key findings

- Only 10% of LLM responses supported dietary supplements as effective treatments for specific diseases.
- LLM accuracy was 57%, lower than in structured nutrition knowledge assessments.
- LLMs often produced ambiguous, lengthy, or expectation-driven responses.

## Abstract

This study evaluated the validity and reliability of large language model (LLM) responses on dietary supplements (DS), a domain marked by scientific controversy and misinformation. The goal was to support informed consumer decisions and guide improvements in LLM performance.

We collected responses from GPT-4 and GPT-4o on the effects of 30 DS on six diseases. Two medical professionals categorized each response as “Effective,” “Uncertain,” or “Not Effective.” They also created a guideline to assess evidence-based effectiveness and compared it with LLM-generated responses to determine accuracy. Additionally, we conducted qualitative content analysis to identify response patterns and misleading content.

GPT-4 and GPT-4o affirmed DS effectiveness in only 10% of cases, with 40% rated as “Uncertain” and 50% as “Not Effective.” Accuracy was about 57%, considerably lower than that observed in nutrition-related studies (57% in DS vs. 80% ∼ in structured nutrition tasks”). Content analysis showed templated responses, frequent ambiguity, and occasional inclusion of irrelevant or incorrect information.

Our findings suggest that ChatGPT's responses on dietary supplements are generally cautious but often ambiguous, with a moderate risk of misinformation. As generative AI becomes a common source for health advice, these limitations could mislead users. Enhancing LLMs' evidence-based accuracy and ensuring consistent professional guidance are essential.

This is the first study to assess the validity and reliability of LLM-generated responses on dietary supplements using both quantitative and qualitative methods. We also developed a novel evidence-based framework to evaluate supplement effectiveness, providing a new tool for future research and supporting safer AI-assisted health communication.

Unlabelled Image

•This is the first study to assess GPT-4 and GPT-4o on dietary supplements.•Only 10% of responses supported supplements as treatments for specific diseases.•GPT responses achieved 57% accuracy, lower than in structured nutrition knowledge.•LLMs achieved highest accuracy for cancer and lowest for constipation topics.•LLMs often produce ambiguous, lengthy, or expectation-driven responses.

This is the first study to assess GPT-4 and GPT-4o on dietary supplements.

Only 10% of responses supported supplements as treatments for specific diseases.

GPT responses achieved 57% accuracy, lower than in structured nutrition knowledge.

LLMs achieved highest accuracy for cancer and lowest for constipation topics.

LLMs often produce ambiguous, lengthy, or expectation-driven responses.

## Full-text entities

- **Diseases:** chronic inflammation (MESH:D007249), Cancer (MESH:D009369), Diabetic (MESH:D003920), LLM (MESH:D007806), obesity (MESH:D009765), fat (MESH:D004620), hypertension (MESH:D006973), DS (MESH:D000740), hallucinations (MESH:D006212), joint pain (MESH:D018771), cardiovascular diseases (MESH:D002318), constipation (MESH:D003248), Appetite suppression (MESH:D001068), type 2 diabetes (MESH:D003924)
- **Chemicals:** omega-3 fatty acids (MESH:D015525), vitamin A (MESH:D014801), EPA (MESH:D015118), free radicals (MESH:D005609), Blood glucose (MESH:D001786), vitamin C (MESH:D001205), 4o (-), soy isoflavones (MESH:D007529), DHA (MESH:D004281), flavonoids (MESH:D005419), calcium (MESH:D002118), anthocyanins (MESH:D000872)
- **Species:** Allium sativum (garlic, species) [taxon 4682], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12955162/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12955162/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12955162/full.md

---
Source: https://tomesphere.com/paper/PMC12955162