# Comparative Analysis of Generative AI Language Models in Orthodontics: Evidence‐Based Insights Into Perplexity, iASK, and ChatGPT 4o Mini

**Authors:** Simarpreet Bhamra, Ramya Vijeta Jathanna

PMC · DOI: 10.1155/tswj/5479774 · The Scientific World Journal · 2026-03-03

## TL;DR

This study compares three AI models for their reliability in answering orthodontic questions, finding Perplexity to be the most accurate.

## Contribution

The study provides evidence-based evaluation of generative AI models in orthodontics, highlighting their scientific reliability.

## Key findings

- Perplexity scored highest (7.2) in answering orthodontic questions compared to iASK (5.4) and ChatGPT 4o mini (5.2).
- High evaluator consistency was observed (Cronbach's alpha = 0.947).
- Perplexity showed significantly better performance than ChatGPT 4o mini and iASK (p = 0.002).

## Abstract

This study is aimed at evaluating and comparing the scientific reliability of three large language models (LLMs), Perplexity, iASK, and ChatGPT 4o mini, based on their responses to orthodontic‐related queries.

The three LLMs were prompted with 10 clinical orthodontic questions, and their responses were assessed independently by two evaluators using a structured scoring system (0–10). Statistical analyses, including Pearson and Spearman correlations, Cronbach′s alpha, and Wilcoxon signed‐rank test, were performed to determine interevaluator reliability and model performance differences.

Perplexity achieved the highest mean score (7.2), followed by iASK (5.4) and ChatGPT 4o mini (5.2). High consistency between evaluators was observed (Cronbach′s alpha = 0.947). A significant difference was noted between Perplexity and both ChatGPT 4o mini and iASK (p = 0.002). Pearson and Spearman correlations indicated strong agreement between evaluators (r = 0.982, ρ = 1.000).

Perplexity demonstrated superior performance in orthodontic‐related queries compared to ChatGPT 4o mini and iASK. The findings highlight the importance of evaluating AI models for clinical applicability and reliability.

## Full-text entities

- **Diseases:** LLM (MESH:D007806), EBD (MESH:D019292)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12957766/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12957766/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12957766/full.md

---
Source: https://tomesphere.com/paper/PMC12957766