# Cross‐Institutional Five‐Class Kellgren–Lawrence Grading of Knee Osteoarthritis via Multitask Deep Learning

**Authors:** Tariq Alkhatatbeh, Ahmad Alkhatatbeh, Yan Liao, Zhilin Zhang, Hang Fang, Weidong Chen, Rongkai Zhang

PMC · DOI: 10.1111/nyas.70254 · Annals of the New York Academy of Sciences · 2026-03-14

## TL;DR

This paper introduces KL-FuseNet, a deep learning model that improves knee osteoarthritis grading across different hospitals by combining global and local image features.

## Contribution

The novel KL-FuseNet architecture fuses global and local features to address domain shift and improve cross-institutional generalization for KL grading.

## Key findings

- KL-FuseNet achieved high internal agreement with a QWK of 0.881 and accuracy of 70.3%.
- Selective fine-tuning improved external accuracy from 66.1% to 80.0% and QWK to 0.950.
- The model demonstrated strong cross-institutional performance after adaptation.

## Abstract

Deep learning models for Kellgren–Lawrence (KL) grading often report optimistic performance due to data leakage and fail to generalize across institutions because of domain shift. To address this reproducibility crisis, we introduce KL‐FuseNet, a multitask architecture fusing global (ConvNeXt‐Base) and local (ResNet‐50) features to predict ordinal grades, label distributions, and binary severity (KL≥2). Using strict patient‐wise stratified splits on an internal osteoarthritis initiative dataset (n = 8260) and an independent Chinese cohort (n = 2295), we compared zero‐shot transfer against selective fine‐tuning. KL‐FuseNet achieved robust internal agreement (quadratic Cohen's kappa [QWK]: 0.881; accuracy: 70.3%). While external zero‐shot deployment revealed a domain gap, with accuracy dropping to 66.1%, our selective fine‐tuning protocol significantly bridged this divide, boosting external accuracy to 80.0% and QWK to 0.950, with an AUC of 0.984 for clinically significant osteoarthritis (KL≥2). These results demonstrate that while KL‐FuseNet achieves state‐of‐the‐art performance under rigorous evaluation, domain‐aware adaptation is essential for clinical utility. This study establishes a reproducible pathway for deploying automated grading models across heterogeneous medical centers.

KL‐FuseNet performs automated Kellgren–Lawrence grading from single‐view knee radiographs by fusing one global stream with two shared‐weight bilateral patch streams. Trained under leakage‐safe patient‐wise splitting, the model uses ordinal and label‐distribution supervision with validation‐based early stopping and test‐time augmentation. It shows strong internal agreement and robust cross‐institution transfer after fine‐tuning, supporting reproducible, clinically oriented knee osteoarthritis (OA) severity assessment.

## Full-text entities

- **Diseases:** Knee Osteoarthritis (MESH:D020370), osteoarthritis (MESH:D010003)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12988769/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12988769/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12988769/full.md

---
Source: https://tomesphere.com/paper/PMC12988769