# Performance of Artificial Intelligence Models in Radiographic Image Analysis for Predicting Hip and Knee Prosthesis Failure: A Systematic Review

**Authors:** Riccardo Stuani, Marco Di Maio, Vincenzo Di Matteo, Katia Chiappetta, Guido Grappiolo, Mattia Loppini

PMC · DOI: 10.3390/bioengineering13010122 · Bioengineering · 2026-01-21

## TL;DR

This review assesses AI models for detecting hip and knee prosthesis failure using radiographs, finding high internal accuracy but lower external performance.

## Contribution

The study systematically evaluates AI's current capabilities and limitations in identifying mechanical failure in joint prostheses via radiographic analysis.

## Key findings

- AI models showed high internal accuracy (83.9% to 97.5%) and AUC values (0.86 to 0.99) in detecting prosthesis failure.
- Performance dropped during external validation, highlighting challenges in generalizability.
- Emerging trends include combining clinical variables and using sequential imaging to improve AI effectiveness.

## Abstract

Background and objectives: The increasing volume of total hip and knee arthroplasty created a significant postoperative surveillance burden. While plain radiographs are standard, the detection of aseptic loosening is subjective. This review evaluates the state of the art regarding AI in radiographic analysis for identifying aseptic loosening and mechanical failure in primary hip and knee prostheses. Methods: A systematic search in PubMed, Scopus, Web of Science, and Cochrane was conducted up to November 2025, following PRISMA guidelines. Peer-reviewed studies describing AI tools applied to radiographs for detecting aseptic loosening or implant failure were included. Studies focusing on infection or acute complications were excluded. Results: Ten studies published between 2020 and 2025 met the inclusion criteria. In internal testing, AI models demonstrated high diagnostic capability, with accuracies ranging from 83.9% to 97.5% and AUC values between 0.86 and 0.99. A performance drop was observed during external validation. Emerging trends include the integration of clinical variables and the use of sequential imaging. Conclusions: AI models show robust potential to match or outperform standard radiographic interpretation for detecting failure. Clinical deployment is limited by variable performance on external datasets. Future research must prioritize robust multi-institutional validation, explainability, and integration of longitudinal data.

## Full-text entities

- **Diseases:** infection (MESH:D007239), Hip and Knee Prosthesis Failure (MESH:D011475)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12838350/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12838350/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12838350/full.md

---
Source: https://tomesphere.com/paper/PMC12838350