# AI-Assisted Diagnostic Evaluation of IHC in Forensic Pathology: A Comparative Study with Human Scoring

**Authors:** Francesco Sessa, Mara Ragusa, Massimiliano Esposito, Mario Chisari, Cristoforo Pomara, Monica Salerno

PMC · DOI: 10.3390/diagnostics16010006 · Diagnostics · 2025-12-19

## TL;DR

This study explores using AI to help evaluate IHC slides in forensic pathology, showing promising accuracy and consistency compared to human experts.

## Contribution

The novel use of a generative AI model (ChatGPT-4V) for IHC diagnostic evaluation in forensic pathology is demonstrated.

## Key findings

- AI achieved 81.3% overall accuracy in classifying IHC images into five categories.
- Binary classification metrics showed high sensitivity (98.3%) and specificity (93.3%).
- AI performed best in extreme IHC categories but struggled with intermediate classifications.

## Abstract

Background/Objectives: Immunohistochemistry (IHC) is a critical diagnostic tool in forensic pathology, enabling molecular-level assessment of wound vitality, post-mortem interval, and cause of death. However, IHC interpretation is subject to variability due to its reliance on human expertise. This study investigates whether artificial intelligence (AI), specifically a generative model, can assist in the diagnostic evaluation of IHC slides and replicate expert-level scoring, thereby improving consistency and reproducibility. Methods: A total of 225 high-resolution IHC images were classified into five immunoreactivity categories. The AI model (ChatGPT-4V) was trained on 150 labeled images and tested blindly on 75 unseen slides. Performance was assessed using confusion matrices, per-class precision/recall/F1, overall accuracy, Cohen’s κ (unweighted and weighted), and binary metrics (sensitivity, specificity, MCC). Results: Overall accuracy was 81.3% (95% CI: 71.1–88.5%), with substantial agreement (κ = 0.767 unweighted; 0.805 linear-weighted; 0.848 quadratic-weighted). Binary classification achieved a sensitivity of 98.3%, specificity of 93.3%, MCC of 0.92. Accuracy was highest in extreme categories (− and +++, 93.3%), while intermediate classes (+ and ++) showed reduced performance (error rates up to 33%). Evaluation was rapid and consistent but lacked interpretative reasoning and struggled with borderline cases. Conclusions: AI-assisted diagnostic evaluation of IHC slides demonstrates promising accuracy and consistency, particularly in well-defined staining patterns. While not a replacement for human expertise, AI can serve as a valuable adjunct in forensic pathology, supporting rapid and standardized assessments. Ethical and legal considerations must guide its implementation in medico-legal contexts.

## Full-text entities

- **Diseases:** death (MESH:D003643)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12785249/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12785249/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12785249/full.md

---
Source: https://tomesphere.com/paper/PMC12785249