# Poster Session I- Poster of Distinction - A46 IMPROVING INTER-OBSERVER AGREEMENT IN ENDOSCOPIC FINDINGS WITH STANDARDIZED TERMINOLOGY

**Authors:** Y Song, S Samnani, V J Arroyo, R Gaikar, A Azad, D Armstrong

PMC · DOI: 10.1093/jcag/gwaf042.046 · 2026-02-13

## TL;DR

Standardizing endoscopic terminology improves agreement among doctors, especially for identifying anatomical landmarks during procedures.

## Contribution

This study demonstrates that harmonized terminology reduces inter-observer disagreement in endoscopic reporting, particularly for esophagogastroduodenoscopy.

## Key findings

- Disagreement scores were lower for landmark recognition in EGD compared to colonoscopy.
- Standardizing terminology increased Fleiss’ kappa most for landmark recognition in EGD.
- Agreement on anatomic landmarks exceeded diagnostic label agreement in EGD.

## Abstract

A shared, standardized vocabulary for endoscopic diagnoses is essential for accurate reporting, clear communication and machine learning (ML). Heterogeneity in report language can lead to inconsistent labeling of findings, inter-observer disagreement, variation in patient management and suboptimal ML. Advances in digital imaging create opportunities to standardize imaging descriptors across providers and evaluate whether harmonized terminology reduces disagreement.

To assess whether standardizing diagnostic terminology decreases inter-observer disagreement for endoscopic findings.

Three gastroenterology (GI) residents independently assigned diagnoses to 150 endoscopic images (esophagogastroduodenoscopy, colonoscopy). An expert faculty GI at an academic center, blinded to resident assessments, reviewed the same images to establish reference diagnoses. Inter-observer disagreement between residents and the expert was calculated to generate a pre-standardization score. The residents then collaborated with the expert to develop a standardized terminology set, which was used to relabel all 150 images, and inter-observer disagreement was recalculated to generate post-standardization score. Fleiss Kappa score was then calculated to determine disagreement between four endoscopists before and after standardization.

Overall, disagreement scores were lower for landmark recognition in esophagogastroduodenoscopy (EGD) than in colonoscopy, whereas disagreement in lesion diagnoses was lower in colonoscopy than in EGD. The largest increases in Fleiss’ kappa were observed after standardizing landmark-recognition terminology for EGDs. Details of disagreement and kappa metrics are provided in Table 1.

Pre & post standardization disagreement scores (%) vs expert GI: 150 images

Standardizing terminology in endoscopy reports improves inter-observer agreement and reduces interpretive discrepancies. Agreement on major anatomic landmarks exceeded agreement on diagnostic labels for EGDs in particular, highlighting substantial variability in descriptors for endoscopic findings. Fleiss’ kappa increased most for landmark recognition in EGD, indicating that harmonized nomenclature can markedly enhance documentation consistency in this domain. Larger, multi-center studies with a broader spectrum of lesions are warranted to validate these results and to quantify the impact of report vernacular on clinical decision-making and patient outcomes.

None

---
Source: https://tomesphere.com/paper/PMC12901709