# Impact of Ground Truth Annotation Quality on Performance of Semantic   Image Segmentation of Traffic Conditions

**Authors:** Vlad Taran, Yuri Gordienko, Alexandr Rokovyi, Oleg Alienin, Sergii, Stirenko

arXiv: 1901.00001 · 2019-04-04

## TL;DR

This study shows that using coarse ground truth annotations can maintain or even improve semantic image segmentation accuracy for certain classes in urban scene datasets, potentially simplifying dataset preparation.

## Contribution

The paper demonstrates that coarse annotations can be effectively used for training and prediction in semantic segmentation, reducing annotation effort without sacrificing accuracy.

## Key findings

- Coarse GT annotations can outperform fine annotations for key classes.
- Standard deviation of accuracy is lower with coarse annotations, indicating more consistent performance.
- Using coarse annotations enables faster dataset preparation and model tuning.

## Abstract

Preparation of high-quality datasets for the urban scene understanding is a labor-intensive task, especially, for datasets designed for the autonomous driving applications. The application of the coarse ground truth (GT) annotations of these datasets without detriment to the accuracy of semantic image segmentation (by the mean intersection over union - mIoU) could simplify and speedup the dataset preparation and model fine tuning before its practical application. Here the results of the comparative analysis for semantic segmentation accuracy obtained by PSPNet deep learning architecture are presented for fine and coarse annotated images from Cityscapes dataset. Two scenarios were investigated: scenario 1 - the fine GT images for training and prediction, and scenario 2 - the fine GT images for training and the coarse GT images for prediction. The obtained results demonstrated that for the most important classes the mean accuracy values of semantic image segmentation for coarse GT annotations are higher than for the fine GT ones, and the standard deviation values are vice versa. It means that for some applications some unimportant classes can be excluded and the model can be tuned further for some classes and specific regions on the coarse GT dataset without loss of the accuracy even. Moreover, this opens the perspectives to use deep neural networks for the preparation of such coarse GT datasets.

---
Source: https://tomesphere.com/paper/1901.00001