# Optimizing Thyroid Nodule Evaluation: AI Integration Into the Thyroid Imaging Reporting and Data System Through AI-Based Ultrasound Image Analysis

**Authors:** Haseeb Arif, Hasan Farooq, Muhammad Omer Altaf, Muhammad D Asjad, Hadiya Mian, Talat Waseem

PMC · DOI: 10.7759/cureus.102893 · 2026-02-03

## TL;DR

This paper explores using AI to improve thyroid nodule risk assessment by integrating it with a standardized ultrasound evaluation system.

## Contribution

A vision-language AI model is developed and evaluated for ACR-TIRADS-based thyroid nodule risk stratification.

## Key findings

- The AI model achieved 67% accuracy and 71% sensitivity in classifying thyroid nodules.
- The model's high precision (84.6%) and F1 score (77%) suggest strong potential for supporting clinical decisions.
- The model shows promise in reducing missed malignant nodules but requires improvement in specificity.

## Abstract

Background

Thyroid nodules are among the most common endocrine abnormalities, with ultrasound serving as the first-line tool for risk stratification. The American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS) standardizes evaluation but is limited by interobserver variability and the time required for detailed interpretation. Artificial intelligence (AI) offers the opportunity to address these limitations and to automate diagnostic processes and enhance diagnostic accuracy.

Objective

To develop and evaluate a vision-language AI model for ACR-TIRADS-based risk stratification of thyroid nodules on ultrasound.

Methodology

This retrospective study analyzed 1,000 thyroid ultrasound images collected between March 2024 and January 2025, of which 139 met the inclusion criteria. Images were annotated according to ACR-TIRADS features (composition, echogenicity, shape, margins, echogenic foci). A vision-language AI model (LLaVA-Med, Microsoft Research, Redmond, WA) was trained using a two-stage strategy that included domain-specific pretraining and fine-tuning on a curated dataset. Diagnostic performance was assessed as a binary classification: suspicious (TR3-TR5) vs. non-suspicious (TR1-TR2).

Results

The model achieved an accuracy of 67%, sensitivity of 71%, specificity of 53%, and precision of 84.6%. The F1 score is an average of an AI algorithm's precision and recall, used to evaluate the algorithm's predictive performance. Our model achieved an F1 score of 77%, and its performance favored sensitivity, reducing the likelihood of missed malignant nodules, though specificity remained moderate.

Conclusion

The vision-language AI model trained on ACR-TIRADS features demonstrated promising performance in thyroid nodule risk stratification. Its higher sensitivity and explainable outputs reflect its potential as a supportive screening tool in clinical practice, particularly in settings with limited radiological expertise. Further refinement and multi-institutional validation are warranted.

## Full-text entities

- **Diseases:** cancer (MESH:D009369), Thyroid Nodule (MESH:D016606), thyroid (MESH:D013966), AI (MESH:C538142), endocrine abnormalities (MESH:D004700), TC (MESH:D013964), papillary TCs (MESH:D002291), thyroid lesions (MESH:D013959)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962039/full.md

---
Source: https://tomesphere.com/paper/PMC12962039