Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

Alexander Jaus; Simon Rei{\ss}; Jens Kleesiek; Rainer; Stiefelhagen

arXiv:2409.13548·eess.IV·November 25, 2024

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

Alexander Jaus, Simon Rei{\ss}, Jens Kleesiek, Rainer, Stiefelhagen

PDF

Open Access 1 Repo

TL;DR

This paper explores dataset trimming by removing easy samples to improve lesion segmentation accuracy in PET/CT imaging, demonstrating reduced false negatives and enhanced model performance.

Contribution

It introduces a novel data trimming method that removes easy samples based on model loss to enhance lesion segmentation in PET/CT datasets.

Findings

01

Reduced false negative volume in lesion segmentation

02

Improved dice score on test set

03

Effective dataset trimming strategy

Abstract

In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexanderjaus/autopet3_datadiet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging