# Data Selection for training Semantic Segmentation CNNs with   cross-dataset weak supervision

**Authors:** Panagiotis Meletis, Rob Romijnders, Gijs Dubbelman

arXiv: 1907.07023 · 2019-07-17

## TL;DR

This paper introduces two data selection methods for training semantic segmentation CNNs with weak supervision, significantly reducing the amount of data needed while maintaining performance, especially in automated driving datasets.

## Contribution

It presents novel data selection techniques based on image similarity and object diversity, improving training efficiency for weakly supervised semantic segmentation.

## Key findings

- Performance gains by reducing weakly labeled data up to 100 times for Open Images.
- Effective data selection improves segmentation accuracy in automated driving datasets.
- Insights into data distribution characterization through GMM modeling.

## Abstract

Training convolutional networks for semantic segmentation with strong (per-pixel) and weak (per-bounding-box) supervision requires a large amount of weakly labeled data. We propose two methods for selecting the most relevant data with weak supervision. The first method is designed for finding visually similar images without the need of labels and is based on modeling image representations with a Gaussian Mixture Model (GMM). As a byproduct of GMM modeling, we present useful insights on characterizing the data generating distribution. The second method aims at finding images with high object diversity and requires only the bounding box labels. Both methods are developed in the context of automated driving and experimentation is conducted on Cityscapes and Open Images datasets. We demonstrate performance gains by reducing the amount of employed weakly labeled images up to 100 times for Open Images and up to 20 times for Cityscapes.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.07023/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1907.07023/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1907.07023/full.md

---
Source: https://tomesphere.com/paper/1907.07023