# A design-based framework for optimal stratification using super-population models with application on real data set of breast cancer

**Authors:** Faizan Danish

PMC · DOI: 10.1371/journal.pone.0323619 · PLOS One · 2025-05-22

## TL;DR

This paper introduces a new method for optimal stratification in sampling to improve the accuracy of population mean estimates, using breast cancer data as a real-world example.

## Contribution

A novel design-based framework for determining optimal strata boundaries and sample sizes using super-population models is proposed and validated.

## Key findings

- The proposed method improves precision in population parameter estimation compared to existing approaches.
- The methodology is successfully applied to breast cancer data, estimating mean perimeter using mean radius and texture.
- Simulation studies confirm the method's higher relative efficiency and versatility across different distributions.

## Abstract

This study investigates the determination of stratification points for two study variables within the framework of simple random sampling, with a focus on estimating the population mean using a closely related auxiliary variable. Employing a superpopulation model, the research aims to minimize overall variance by deriving simplified equations that enhance the precision of parameter estimates. Instead of categorizing variables, the study emphasizes continuous variables to establish optimal strata boundaries (OSB), which are essential for creating homogeneous groups within each stratum. This stratification leads to more efficient sample sizes (SS) and improved accuracy in parameter estimation. However, achieving optimal OSB and SS poses challenges in scenarios with a fixed total sample size, such as survey designs constrained by limited budgets. To address this, the study proposes a robust methodology for calculating OSB and SS, leveraging knowledge of the survey’s per-unit stratum measurement costs or its probability density function. An empirical application of the method is demonstrated using breast cancer data, where the mean perimeter is estimated based on mean radius and mean texture. Additionally, hypothetical examples using Cauchy and standard power distributions are provided to illustrate the versatility of the proposed approach. The newly developed method has been integrated into the updated stratifyR package and implemented in LINGO software, facilitating its practical application. Comparative analysis reveals that this approach consistently outperforms or matches existing methods in enhancing the precision of population parameter estimation. Furthermore, simulation studies confirm its higher relative efficiency, making it a valuable contribution to the field of stratified sampling.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** breast cancer (MESH:D001943)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12097798/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12097798/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12097798/full.md

---
Source: https://tomesphere.com/paper/PMC12097798