Automatic Labels are as Effective as Manual Labels in Biomedical Images   Classification with Deep Learning

Niccol\`o Marini; Stefano Marchesin; Lluis Borras Ferris; Simon; P\"uttmann; Marek Wodzinski; Riccardo Fratti; Damian Podareanu; Alessandro; Caputo; Svetla Boytcheva; Simona Vatrano; Filippo Fraggetta; Iris Nagtegaal,; Gianmaria Silvello; Manfredo Atzori; Henning M\"uller

arXiv:2406.14351·eess.IV·June 21, 2024

Automatic Labels are as Effective as Manual Labels in Biomedical Images Classification with Deep Learning

Niccol\`o Marini, Stefano Marchesin, Lluis Borras Ferris, Simon, P\"uttmann, Marek Wodzinski, Riccardo Fratti, Damian Podareanu, Alessandro, Caputo, Svetla Boytcheva, Simona Vatrano, Filippo Fraggetta, Iris Nagtegaal,, Gianmaria Silvello, Manfredo Atzori, Henning M\"uller

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that automatic labeling methods can produce labels with noise levels low enough (2-5%) to be as effective as manual labels for training deep learning models on biomedical image classification tasks, across multiple architectures and datasets.

Contribution

The paper provides evidence that automatic labels with minimal noise can replace manual labels in training deep learning models for biomedical image classification.

Findings

01

Automatic labels with 2-5% noise yield comparable performance to manual labels.

02

Models trained with automatic labels perform well across CNN and ViT architectures.

03

Automatic labeling is feasible for diverse biomedical datasets including binary, multiclass, and multilabel tasks.

Abstract

The increasing availability of biomedical data is helping to design more robust deep learning (DL) algorithms to analyze biomedical samples. Currently, one of the main limitations to train DL algorithms to perform a specific task is the need for medical experts to label data. Automatic methods to label data exist, however automatic labels can be noisy and it is not completely clear when automatic labels can be adopted to train DL models. This paper aims to investigate under which circumstances automatic labels can be adopted to train a DL model on the classification of Whole Slide Images (WSI). The analysis involves multiple architectures, such as Convolutional Neural Networks (CNN) and Vision Transformer (ViT), and over 10000 WSIs, collected from three use cases: celiac disease, lung cancer and colon cancer, which one including respectively binary, multiclass and multilabel data. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ilmaro8/wsi_analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Imaging for Blood Diseases · AI in cancer detection · Image and Object Detection Techniques

MethodsLinear Layer · Multi-Head Attention · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam