# Instability of estimation results based on caliper matching with propensity scores

**Authors:** Kazushi Maruo, Yusuke Yamaguchi, Ryota Ishii, Masahiko Gosho, Md. Belal Hossain, Md. Belal Hossain, Md. Belal Hossain, Md. Belal Hossain

PMC · DOI: 10.1371/journal.pone.0325317 · PLOS One · 2025-06-06

## TL;DR

This paper shows that using random order caliper matching in observational studies can lead to unstable and non-reproducible results, and suggests better alternatives.

## Contribution

The paper identifies conditions under which instability occurs and recommends specific alternative matching order methods.

## Key findings

- Instability is more severe with small sample sizes, large odds ratios, and high c-statistics.
- Lowest to highest score order matching or median of multiple random order results are more stable alternatives.
- Pre-specifying the matching order method is recommended to improve reproducibility.

## Abstract

Caliper matching is often used to adjust for confounding biases in observational studies. This method with random order matching allows for the cherry-picking of the analysis results to suit the analyst’s convenience. Random order matching can also result in large fluctuations in the analysis results due to small additions and/or changes in data. These “instability problems” might compromise the reproducibility of the study results. Some studies have discussed instability issues, but the conditions are limited, and there is no knowledge of which alternative order method should be used instead of the random order method. We evaluate the instability problem by calculating the extent to which the results can vary within a single study dataset and provide guidelines for choosing the best alternative matching order method based on simulations and a case study. From simulation studies, instability might be serious when the sample size was small, the true odds ratio was large, the proportion for the treatment group was large, and the c-statistic for the propensity score model was large. We recommend not using random order matching and instead using lowest to highest score order matching or the median of multiple random order matching results. We also recommend pre-specifying the matching order method.

## Full-text entities

- **Diseases:** Breast Cancer (MESH:D001943), death (MESH:D003643), node (MESH:D012804), tumor (MESH:D009369)
- **Chemicals:** PONE-D-24-14920R2 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12143538/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12143538/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12143538/full.md

---
Source: https://tomesphere.com/paper/PMC12143538