# ColoPola: A polarimetric imaging dataset for colorectal cancer detection

**Authors:** Thi-Thu-Hien Pham, Quoc-Hoang-Quyen Vo, Thao-Vi Nguyen, The-Hiep Nguyen, Quoc-Hung Phan, Thanh-Hai Le

PMC · DOI: 10.1093/gigascience/giaf120 · GigaScience · 2025-10-16

## TL;DR

This paper introduces ColoPola, a new dataset of polarimetric images for colorectal cancer detection, and evaluates its effectiveness using various machine learning models.

## Contribution

The paper introduces ColoPola, the first standardized polarimetric imaging dataset for colorectal cancer.

## Key findings

- ColoPola contains 20,592 polarimetric images from 572 sample slices (288 healthy, 284 malignant).
- EfficientNetV2 achieved the highest performance with F1 score of 0.965 and all metrics exceeding 0.95.
- The dataset shows significant potential as a diagnostic tool for colorectal cancer in clinical practice.

## Abstract

In recent years, polarimetric imaging has been developed for various biological applications, including tissue morphological characterization and cancer stage detection. However, to facilitate classification models based on the characteristics of polarization states, it is essential to develop a consistent and standardized dataset of polarimetric images.

This study presents a dataset of colorectal cancer polarimetric images designated as ColoPola, which is intended to facilitate research efforts in the field. The dataset consists of 572 sample slices (288 healthy and 284 malignant). For each slice, 36 polarimetric images corresponding to different polarization states are provided. Thus, ColoPola contains 20,592 polarimetric images, of which 10,368 correspond to healthy samples and 10,224 to malignant samples. To the best of the authors’ knowledge, the dataset is the first of its kind for colorectal cancer images. The practical utility of the dataset is evaluated using 5 models: 3 models constructed from scratch (CNN, CNN_2, and EfficientFormerV2) and 2 pretrained models (DenseNet and EfficientNetV2). For each model, the input has a size of 224 × 224 × 36, corresponding to the width, height, and red channel value of the polarimetric images, respectively.

The results show that the CNN, CNN_2, EfficientFormerV2, DenseNet, and EfficientNetV2 models obtain F1 scores of 0.870, 0.862, 0.908, 0.903, and 0.965, respectively, on the testing set. Among the 5 models, EfficientNetV2 achieves the best performance, with all the performance metrics exceeding 0.95 for both the validation set and the testing set. Overall, the results suggest that ColoPola has significant potential as a polarimetric optical imaging-based diagnostic tool for colorectal cancer in clinical practice.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), colorectal cancer (MESH:D015179)
- **Chemicals:** ColoPola (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12530094/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12530094/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12530094/full.md

---
Source: https://tomesphere.com/paper/PMC12530094