# Comparing Agreement Indices to Assess Inter-Observer Reliability in the Case of Dichotomous and Trichotomous Animal-Based Welfare Indicators with Three Raters

**Authors:** Benedetta Torsiello, Mauro Giammarino, Piero Quatto, Monica Battini, Silvana Mattiello, Luca Battaglini, Manuela Renna

PMC · DOI: 10.3390/ani16040546 · Animals : an Open Access Journal from MDPI · 2026-02-10

## TL;DR

This paper compares different statistical methods to measure agreement among three raters assessing animal welfare indicators, finding some less-known indices more reliable than commonly used ones.

## Contribution

The study identifies specific agreement indices suitable for evaluating inter-observer reliability in animal welfare assessments with three raters.

## Key findings

- Commonly used Kappa-based indices are unsuitable for three raters due to paradox behavior.
- Gwet’s γ(AC1), BP coefficient, and Quatto’s S provide reliable results for dichotomous indicators.
- Gwet’s γ(AC2) and weighted forms of BP and S are best for trichotomous indicators.

## Abstract

Nowadays, the evaluation of inter-observer reliability is of outmost importance for ensuring the introduction of individual animal-based welfare indicators within animal welfare protocols. The present study focuses on the evaluation of inter-observer reliability of dichotomous and trichotomous individual animal-based welfare indicators (assessed through two/three levels scoring system), which is guaranteed calculating the concordance among three raters during the evaluation process through some statistical indices proposed in the current literature, defined as agreement indices. In this regard, the performance of the most popular agreement indices is compared to understand which ones are the most suitable to assess the inter-observer reliability. The most exploited agreement indices (e.g., the indices belonging to the Kappa statistic) are shown to be inappropriate to evaluate the inter-observer reliability in the presence of three raters. On the contrary, some less known agreement indices, such as Gwet’s γ(AC1), Gwet’s γ(AC2), Quatto’s S, Quatto’s weighted S, Brennan and Prediger’s BP coefficient and Brennan and Prediger’s weighted BP, were able to confer more reliable agreement results.

This study deals with the evaluation of inter-observer reliability (IOR) among three raters in the case of dichotomous and trichotomous individual animal-based welfare indicators. The performance of the most documented agreement indices proposed in the literature was compared, using udder asymmetry (UA) as a dichotomous indicator and body condition score (BCS) as a trichotomous indicator, both obtained from the AWIN Goat protocol. Nine dairy goat farms, exploiting three alpine pastures (AP1 to AP3), were used for data collection. Krippendorff’s α, the agreement indices belonging to the Kappa statistic and their weighted forms were in some cases affected by the paradox behaviour. This phenomenon was observed for both UA and BCS [e.g., P0(BCS-AP2) = 80%; Fleiss’ K = 0.22]. In the case of UA, Gwet’s γ(AC1), followed by BP coefficient and Quatto’s S, gave the best agreement results [e.g., P0(UA-AP1) = 86%; γ(AC1) = 0.84]. In the case of BCS, the best agreement results were obtained with Gwet’s γ(AC2), followed by the weighted forms of BP and S. When the evaluation is performed by three raters, γ(AC1), BP and S are suggested to evaluate IOR in the case of both dichotomous and trichotomous indicators, while the related weighted forms are suitable for trichotomous indicators only.

## Linked entities

- **Species:** Capra hircus (taxon 9925)

## Full-text entities

- **Diseases:** Body Condition (MESH:D057215), injury to (MESH:D014947), respiratory diseases (MESH:D012140), lameness (MESH:D007794), kneel fracture (MESH:D050723)
- **Chemicals:** UA (-)
- **Species:** Gallus gallus (bantam, species) [taxon 9031], Homo sapiens (human, species) [taxon 9606], Capra hircus (domestic goat, species) [taxon 9925], Equus caballus (domestic horse, species) [taxon 9796], Bos taurus (bovine, species) [taxon 9913]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12937283/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12937283/full.md

---
Source: https://tomesphere.com/paper/PMC12937283