# Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process

**Authors:** Masao Noda, Ryota Koshu, Reiko Tsunoda, Hirofumi Ogihara, Tomohiko Kamo, Makoto Ito, Hiroaki Fushiki

PMC · DOI: 10.2196/70070 · JMIR Formative Research · 2025-06-06

## TL;DR

This paper explores using GPT-4V for classifying nystagmus, a type of eye movement disorder, by developing a pupil-tracking process and evaluating its accuracy.

## Contribution

The novel use of GPT-4V for nystagmus classification with a developed pupil-tracking process is presented.

## Key findings

- The model achieved 37% accuracy with pupil-traced images and 24.6% with coordinate inputs.
- Horizontal nystagmus classification reached 69% accuracy, but vertical and torsional types had lower accuracy.
- The study highlights the potential of GPT-4V for vertigo management and suggests improvements like larger datasets.

## Abstract

Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been used to automate nystagmus classification using convolutional and recurrent neural networks. These networks can accurately classify nystagmus patterns using video data. However, associated challenges including the need for large datasets when creating models, limited applicability to address specific image conditions, and the complexity associated with using these models.

This study aimed to evaluate a novel approach for nystagmus classification that used the Generative Pre-trained Transformer 4 Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.

We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model’s accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.

The developed model showed an overall classification accuracy of 37% when using pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were used as input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.

We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding the dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model validated only for recognizing still images can be linked to video classification and proposed as a novel method.

## Linked entities

- **Diseases:** nystagmus (MONDO:0005712)

## Full-text entities

- **Diseases:** Nystagmus (MESH:D009759), vertigo (MESH:D014717)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12164947/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12164947/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12164947/full.md

---
Source: https://tomesphere.com/paper/PMC12164947