# Differential Diagnosis of Parotid Tumors on Ultrasound: Interobserver Variability and Examiner-Specific Decision Rules—A Machine Learning Approach

**Authors:** Lukas Pillong, Ida Ohnesorg, Lukas Alexander Brust, Jan Palm, Julia Schulze-Berge, Victoria Bozzato, Manfred Voges, Adrian Müller, Malvina Garner, Alessandro Bozzato

PMC · DOI: 10.3390/diagnostics16060880 · 2026-03-16

## TL;DR

This study uses machine learning to analyze how different examiners diagnose parotid tumors via ultrasound and finds significant variability in their assessments.

## Contribution

The novel use of interpretable machine learning surrogates to model and visualize examiner-specific decision patterns in parotid tumor diagnosis.

## Key findings

- Examiner accuracy in diagnosing parotid tumors ranged from 63.5% to 90.5%.
- Decision-tree surrogates accurately approximated individual examiners' diagnostic behavior with high coverage.
- Objective ultrasound descriptors showed higher interobserver agreement than subjective ones.

## Abstract

Background/Objectives: Noninvasive differentiation of parotid gland tumors remains challenging despite ultrasound being the primary imaging modality for salivary gland lesions. Given its examiner dependence, improving diagnostic consistency and transparency is crucial. We quantified interobserver variability in parotid ultrasound, modeled examiner-specific decision patterns using machine learning surrogates, and tested whether surrogate complexity relates to examiner performance. Methods: In this retrospective, single-center study, six examiners independently rated ultrasound images of 149 parotid tumors using predefined descriptors. Performance was summarized using accuracy and the area under the receiver operating characteristic curve (AUC), with 95% confidence intervals (CIs). AUCs were compared using DeLong tests (Holm-adjusted). Interobserver agreement was assessed using pairwise Cohen’s and global Fleiss’ κ. For each examiner, a decision-tree surrogate was trained from structured descriptors and clinical metadata to reproduce examiner labels and visualize decision pathways; performance was estimated by 5-fold cross-validation. Results: Examiner accuracy ranged from 63.5% to 90.5% and AUC from 0.66 to 0.89 (best 0.89, 95% CI 0.83–0.95); the best performer exceeded the two lowest performers (p < 0.001). Agreement was higher for objective descriptors (size: κ = 0.57–0.97) than for subjective descriptors (echogenicity: κ = 0.11–0.79). Surrogate decision-tree accuracy versus histopathology ranged from 57.2% to 80.0% for unpruned and from 65.1% to 76.5% for pruned models, with high coverage (95.3–98.7%). Tree complexity showed no consistent association with examiner performance. Conclusions: Parotid ultrasound shows substantial interobserver variability. Interpretable surrogates can approximate individual labeling behavior from structured descriptors and clinical metadata, making examiner-dependent decision patterns explicit.

## Full-text entities

- **Diseases:** Parotid Tumors (MESH:D010307), salivary gland lesions (MESH:D012466)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13025738/full.md

---
Source: https://tomesphere.com/paper/PMC13025738