# Group discussions improve reliability and validity of rated categories based on qualitative data from systematic review

**Authors:** Jutta Beher, Eric Treml, Brendan Wintle, Frank Koch, Frank Koch, Frank Koch, Frank Koch

PMC · DOI: 10.1371/journal.pone.0326166 · PLOS One · 2025-06-18

## TL;DR

Group discussions help improve the accuracy and consistency of qualitative data analysis in systematic reviews of conservation literature.

## Contribution

A pragmatic approach to reliability testing using group discussions to reduce misclassification in qualitative data coding.

## Key findings

- Mistakes like overlooking information were the most common source of disagreement among raters.
- Group discussions resolved most differences in ratings, improving reliability and accuracy.
- The approach provides insights into error rates and is recommended for improving current review methods.

## Abstract

The number of literature reviews in the fields of ecology and conservation has increased dramatically in recent years. Scientists conduct systematic literature reviews with the aim of drawing conclusions based on the content of a representative sample of publications. This requires subjective judgments on qualitative content, including interpretations and deductions. However, subjective judgments can differ substantially even between highly trained experts that are faced with the same evidence. Because classification of content into codes by one individual rater is prone to subjectivity and error, general guidelines recommend checking the produced data for consistency and reliability. Metrics on agreement between multiple people exist to assess the rate of agreement (consistency). These metrics do not account for mistakes or allow for their correction, while group discussions about codes that have been derived from classification of qualitative data have shown to improve reliability and accuracy. Here, we describe a pragmatic approach to reliability testing that gives insights into the error rate of multiple raters. Five independent raters rated and discussed categories for 23 variables within 21 peer-reviewed publications on conservation management plans. Mistakes, including overlooking information in the text, were the most common source of disagreement, followed by differences in interpretation and ambiguity around categories. Discussions could resolve most differences in ratings. We recommend our approach as a significant improvement on current review and synthesis approaches that lack assessment of misclassification.

## Full-text entities

- **Diseases:** Hallucinations (MESH:D006212)
- **Chemicals:** PONE-D-24-38272 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Sus scrofa (pig, species) [taxon 9823]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12176165/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12176165/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12176165/full.md

---
Source: https://tomesphere.com/paper/PMC12176165