# Patterns of observer error in scoring macromorphoscopic traits for population affinity

**Authors:** Leandi Liebenberg, Kyra E. Stull, Ericka N. L'Abbé

PMC · DOI: 10.1111/1556-4029.70063 · Journal of Forensic Sciences · 2025-05-07

## TL;DR

This study examines how observer differences affect the reliability of scoring macromorphoscopic traits in crania for population classification.

## Contribution

The study identifies specific traits and observer factors that influence repeatability in macromorphoscopic trait scoring.

## Key findings

- Intra-observer agreement ranged from moderate to perfect, with some traits showing low agreement.
- Inter-observer repeatability varied from poor to substantial, improving slightly after discussion and rescoring.
- Familiarity with specific traits and procedures, rather than general experience, improved consistency.

## Abstract

Revising methodologies is essential to understand the limitations and biases inherent in certain methods, which is crucial for obtaining reliable results. Due to the subjective nature of non‐metric methods, variation in trait scoring and its impact on accurately classifying biological parameters remains a concern that requires further investigation. This study aimed to examine the effects of observer experience, familiarity with the method, and different statistical approaches on the repeatability of macromorphoscopic traits in the cranium for population affinity. Seventeen traits were scored on a sample of 10 crania by five observers with varying experience levels. Intra‐observer agreement ranged from moderate to perfect, with three traits—inferior nasal margin, nasal bone shape, and nasal overgrowth demonstrating—the lowest agreement. Overall, inter‐observer repeatability ranged from poor to substantial agreement. After a group discussion on the scoring procedure and subsequent rescoring of the crania, a slight improvement in agreement was observed, with kappa values shifting towards moderate and substantial levels. Each observer exhibited variation in the repeatability of different traits. While general experience did not consistently translate into proficiency with the method, familiarity with the specific traits and scoring procedures contributed to more consistent results. Therefore, method‐specific training is crucial before applying the MMS traits in practice. Additionally, the choice of statistical approaches—such as applying different weights to Cohen's kappa based on data type—can influence the perceived reliability of a method. Practitioners should select weights and tests that are most appropriate for the data type of each trait being analyzed.

## Full-text entities

- **Genes:** MMS [NCBI Gene 338340]
- **Diseases:** post (MESH:D000094025), NO (MESH:D009668), bregmatic depression (MESH:D003866)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12223330/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12223330/full.md

---
Source: https://tomesphere.com/paper/PMC12223330