# Structure over color: Diagnostic information in H&E images resides primarily in grayscale

**Authors:** Leslie Dalton, Ian O. Ellis, Emad A Rakha

PMC · DOI: 10.1016/j.jpi.2026.100646 · Journal of Pathology Informatics · 2026-02-05

## TL;DR

This study shows that grayscale images of H&E-stained tissue contain most of the diagnostic information needed for breast cancer grading, with color playing a minor role.

## Contribution

The study demonstrates that color normalization may be less critical than previously thought for AI-based diagnostic models using H&E images.

## Key findings

- Structural information in H&E images is primarily in the grayscale (luminance) channel.
- Grayscale, CBF, and XCC images retained high fidelity and model performance (AUC ≥ 0.85).
- Extreme color compression did not significantly impact diagnostic AI model accuracy.

## Abstract

Color is a defining feature of hematoxylin and eosin (H&E)-stained histological sections; however, the extent to which diagnostic interpretation depends on color rather than grayscale-defined structure remains uncertain. We utilized systematic digital color manipulations and deep learning (DL) to interrogate the diagnostic contribution of color in H&E-stained slide-based breast cancer grading. H&E images were transformed into the YCbCr color space, enabling independent manipulation of luminance (Y) and chrominance (Cb, Cr). Four image variants were generated: grayscale, color-only, color-blind-friendly (CBF), and extreme color compression (XCC). CBF images were produced by transferring red-green information (Cr) into the blue-yellow (Cb) channel, whereas color-only images were created by fixing luminance at a constant value. XCC involved differential compression of luminance and chrominance, with chrominance compressed at an extreme ratio (1:1000). Image fidelity was assessed using multi-scale structural similarity (MS-SSIM). DL models (ConvNeXt) for breast cancer grading were independently trained and tested using each image variant. Quantitative assessment confirmed that structural information is primarily localized within the luminance channel. Grayscale, CBF, and XCC images demonstrated minimal loss of image fidelity (MS-SSIM > 0.95), whereas color-only images showed markedly reduced fidelity (MS-SSIM ≈ 0.15). DL predictions were highly concordant across original, grayscale, CBF, and XCC images (Spearman ρ > 0.9 for all comparisons), with all achieving area under the curve values ≥0.85. Although performance was reduced for color-only images, it remained higher than anticipated. Notably, extreme compression of color channels did not adversely affect image quality or model performance. These findings provide evidence that diagnostic information in H&E images resides primarily in structural features encoded by grayscale. The results suggest that diagnostic AI models can operate effectively without color information, that the emphasis on color normalization may be overstated, and that color data can be subjected to extreme compression with limited impact on diagnostic integrity.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** breast cancer (MESH:D001943)
- **Chemicals:** Cb (MESH:C063451), Cr (MESH:D002857), H&amp;E (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12991843/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12991843/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12991843/full.md

---
Source: https://tomesphere.com/paper/PMC12991843