# Investigating Correlations of Inter-coder Agreement and Machine   Annotation Performance for Historical Video Data

**Authors:** Kader Pustu-Iren, Markus M\"uhling, Nikolaus Korfhage, Joanna, Bars, Sabrina Bernh\"oft, Angelika H\"orth, Bernd Freisleben and, Ralph Ewerth

arXiv: 1907.10450 · 2019-07-31

## TL;DR

This study examines how inter-coder agreement affects machine annotation performance on historical GDR video data, highlighting the importance of annotation consistency for improving semantic search and recognition accuracy.

## Contribution

It provides an analysis of inter-coder agreement among experts and non-experts and explores its correlation with machine learning performance in historical video annotation.

## Key findings

- Higher inter-coder agreement correlates with better machine recognition performance.
- Expert annotations lead to more accurate person recognition than non-expert annotations.
- Image quantity and agreement levels can predict annotation precision.

## Abstract

Video indexing approaches such as visual concept classification and person recognition are essential to enable fine-grained semantic search in large-scale video archives such as the historical video collection of former German Democratic Republic (GDR) maintained by the German Broadcasting Archive (DRA). Typically, a lexicon of visual concepts has to be defined for semantic search. However, the definition of visual concepts can be more or less subjective due to individually differing judgments of annotators, which may have an impact on annotation quality and subsequently training of supervised machine learning methods. In this paper, we analyze the inter-coder agreement for historical TV data of the former GDR for visual concept classification and person recognition. The inter-coder agreement is evaluated for a group of expert as well as non-expert annotators in order to determine differences in annotation homogeneity. Furthermore, correlations between visual recognition performance and inter-annotator agreement are measured. In this context, information about image quantity and agreement are used to predict average precision for concept classification. Finally, the influence of expert vs. non-expert annotations acquired in the study are used to evaluate person recognition.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10450/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10450/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/1907.10450/full.md

---
Source: https://tomesphere.com/paper/1907.10450