# Semantic segmentation dataset authoring with simplified labels

**Authors:** Leo Uramoto, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kensaku Mori

PMC · DOI: 10.1007/s11548-024-03314-9 · International Journal of Computer Assisted Radiology and Surgery · 2025-04-05

## TL;DR

This paper introduces simplified labels to make it easier for non-medical people to help annotate medical images, improving dataset creation and multi-dataset training.

## Contribution

The novel contribution is introducing simplified labels to reduce reliance on medical experts during dataset creation and enable multi-dataset training.

## Key findings

- Including non-medical annotators improves dataset creation, though medical annotators are more effective.
- Simplified labels allow multi-dataset training even when datasets have no overlapping classes.
- Using simplified labels with non-medical annotators increases Dice scores by up to 6.9%.

## Abstract

Semantic segmentation of laparoscopic images is a key problem in surgical scene understanding. Creating ground truth labels for semantic segmentation tasks is time consuming, and in the medical field a need for medical training of annotators adds further complications, leading to reliance on a small pool of experts. Previous research has focused on reducing the time to author datasets, by using spatially weak labels, pseudolabels, and synthetic data. In this paper, we address the difficulties caused by the need for medically trained annotators, hoping to enable non-medical annotators to participate in medical annotation tasks, to ease the creation of large datasets.

We propose simplified labels, labels that are semantically weak. Our labels allow non-medical annotators to participate in medical dataset authoring, by lowering the need for medical expertise. We simulate authoring processes with mixtures of medical and non-medical annotators and measure the impact adding non-medical annotators has on accuracy. We also show that simplified labels offer a simple formulation for multi-dataset training.

We show that simplified labels are a viable approach to dataset authoring. Including non-medical annotators in the authoring process is beneficial, but medically trained annotators are worth multiple non-medical annotators, with maximal Dice score increases of 9.3% for 1 medically trained annotator and 6.9% for 3 non-medical annotators. We also show that the labels offer a simple formulation for multi-dataset training, even with no overlapping classes. We find that converting the labels of a secondary incompatible dataset into simplified labels and jointly training on both datasets improves performance.

Simplified labels offer a framework that can be applied both to dataset authoring and to multi-dataset training. Using the proposed method, non-medical annotators can participate in semantic segmentation dataset authoring. Labels of incompatible datasets can be converted into simplified datasets, enabling multi-dataset training.

## Full-text entities

- **Diseases:** pain (MESH:D010146), COVID-19 infection lesions (MESH:D000086382)
- **Chemicals:** FCN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12055892/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12055892/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12055892/full.md

---
Source: https://tomesphere.com/paper/PMC12055892