Comparison of Validity and Reliability of Manual Consensus Grading vs. Automated AI Grading for Diabetic Retinopathy Screening in Oslo, Norway: A Cross-Sectional Pilot Study

Mia Karabeg; Goran Petrovski; Katrine Holen; Ellen Steffensen Sauesund; Dag Sigurd Fosmark; Greg Russell; Maja Gran Erke; Vallo Volke; Vidas Raudonis; Rasa Verkauskiene; Jelizaveta Sokolovska; Morten Carstens Moe; Inga-Britt Kjellevold Haugen; Beata Eva Petrovski

PMC · DOI:10.3390/jcm14134810·July 7, 2025

Comparison of Validity and Reliability of Manual Consensus Grading vs. Automated AI Grading for Diabetic Retinopathy Screening in Oslo, Norway: A Cross-Sectional Pilot Study

Mia Karabeg, Goran Petrovski, Katrine Holen, Ellen Steffensen Sauesund, Dag Sigurd Fosmark, Greg Russell, Maja Gran Erke, Vallo Volke, Vidas Raudonis, Rasa Verkauskiene, Jelizaveta Sokolovska, Morten Carstens Moe, Inga-Britt Kjellevold Haugen, Beata Eva Petrovski

PDF

Open Access

TL;DR

This study compares AI and manual grading for detecting diabetic retinopathy, finding AI to be effective but needing human oversight for accuracy.

Contribution

The study evaluates AI grading's diagnostic reliability and validity in DR screening against manual consensus grading in a clinical setting.

Findings

01

AI grading showed high sensitivity (94.0%) and acceptable specificity (72.6%) for detecting diabetic retinopathy.

02

Moderate agreement (measured by Kappa statistics) was found between AI and manual grading methods.

03

Only one eye was identified with diabetic macular edema by both AI and manual methods.

Abstract

Background: Diabetic retinopathy (DR) is a leading cause of visual impairment worldwide. Manual grading of fundus images is the gold standard in DR screening, although it is time-consuming. Artificial intelligence (AI)-based algorithms offer a faster alternative, though concerns remain about their diagnostic reliability. Methods: A cross-sectional pilot study among patients (≥18 years) with diabetes was established for DR and diabetic macular edema (DME) screening at the Oslo University Hospital (OUH), Department of Ophthalmology, and the Norwegian Association of the Blind and Partially Sighted (NABP). The aim of the study was to evaluate the validity (accuracy, sensitivity, specificity) and reliability (inter-rater agreement) of automated AI-based compared to manual consensus (MC) grading of DR and DME, performed by a multidisciplinary team of healthcare professionals. Grading of DR…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases6

Diabetic retinopathy Diabetic macular edema diabetes DME visual impairment DR

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Retinal Diseases and Treatments · Acute Ischemic Stroke Management