# Imperfect gold standard gene sets yield inaccurate evaluation of causal gene identification methods

**Authors:** Lijia Wang, Xiaoquan Wen, Jean Morrison

PMC · DOI: 10.1038/s42003-024-06482-1 · Communications Biology · 2024-07-17

## TL;DR

This paper argues that using incomplete gene sets to evaluate methods for finding causal genes leads to misleading results and suggests better statistical approaches instead.

## Contribution

Highlights flaws in using imperfect gene sets for evaluation and advocates for probabilistic modeling techniques.

## Key findings

- Treating non-GS genes as negatives leads to biased sensitivity, specificity, and AUC estimates.
- Labeling biases in gold standard sets distort method comparisons.
- Probabilistic modeling is recommended over GS-based evaluation for more accurate results.

## Abstract

Causal gene discovery methods are often evaluated using reference sets of causal genes, which are treated as gold standards (GS) for the purposes of evaluation. However, evaluation methods typically treat genes not in the GS positive set as known negatives rather than unknowns. This leads to inaccurate estimates of sensitivity, specificity, and AUC. Labeling biases in GS gene sets can also lead to inaccurate ordering of alternative causal gene discovery methods. We argue that the evaluation of causal gene discovery methods should rely on statistical techniques like those used for variant discovery rather than on comparison with GS gene sets.

This perspective highlights the limitations of empirically evaluating causal gene discovery methods in the absence of completely labeled reference gene sets. It shows that sensitivity, specificity, and AUC may be critically biased, and advocate for increased reliance on probabilistic modeling.

## Full-text entities

- **Genes:** PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}
- **Diseases:** PCG (MESH:C537680)
- **Chemicals:** GS (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11255313/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11255313/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC11255313/full.md

---
Source: https://tomesphere.com/paper/PMC11255313