# Automated self-service cohort selection for large-scale population sciences and observational research: The California Teachers Study researcher platform

**Authors:** James V. Lacey, Emma S. Spielfogel, Jennifer L. Benbow, Kristen E. Savage, Kai Lin, Cheryl A. M. Anderson, Jessica Clague-DeHart, Christine N. Duffy, Maria Elena Martinez, Hannah Lui Park, Caroline A. Thompson, Sophia S. Wang, Sandeep Chandra

PMC · DOI: 10.1371/journal.pone.0296611 · PLOS One · 2025-05-12

## TL;DR

The paper introduces a self-service platform for researchers to automate cohort selection in large-scale population studies, making the process faster and more scalable.

## Contribution

The novel contribution is a web-based researcher platform that automates cohort selection and data generation for population sciences.

## Key findings

- The platform enables researchers to independently generate custom datasets, code, and documentation.
- Compared to manual methods, the platform is faster, more scalable, and user-friendly.
- The framework is flexible and can be adapted for other population studies.

## Abstract

Cohort selection is ubiquitous and essential, but manual and ad hoc approaches are time-consuming, labor-intense, and difficult to scale. We sought to automate the task of cohort selection by building self-service tools that enable researchers to independently generate datasets for population sciences research.

The California Teachers Study (CTS) is a prospective observational study of 133,477 women who have been followed continuously since 1995. The CTS includes extensive survey-based and real-world data from cancer, hospitalization, and mortality linkages. We curated data from our data warehouse into a column-oriented database and developed a researcher-facing web application that guides researchers through the project lifecycle; captures researchers’ inputs; and automatically generates custom and analysis-ready data, code, dictionaries, and documentation.

Researchers can register, access data, and propose projects on the CTS Researcher Platform via our CTS website. The Platform supports cohort and cross-sectional study designs for cancer, mortality, and any other ICD-based phenotypes or endpoints. User-friendly prompts and menus capture analytic design, inclusion/exclusion criteria, endpoint definitions, censoring rules, and covariate selection. Our platform empowers researchers everywhere to query, choose, review, and automatically and quickly receive custom data, analytic scripts, and documentation for their research projects. Research teams can review, revise, and update their choices anytime.

We replaced inefficient traditional cohort-selection processes with an integrated self-service approach that simplifies and improves cohort selection for all stakeholders. Compared with manual methods, our solution is faster and more scalable, user-friendly, and collaborative. Other studies could re-configure our individual database, project-tracking, website, and data-delivery components for their own specific needs, or they could utilize other widely available solutions (e.g., alternative database or project-tracking tools) to enable similarly automated cohort-selection in their own settings. Our comprehensive and flexible framework could be adopted to improve cohort selection in other population sciences and observational research settings.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Diseases:** cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12068635/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12068635/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12068635/full.md

---
Source: https://tomesphere.com/paper/PMC12068635