# Census tract-level socioeconomic variables and breast cancer characteristics and outcomes in California and New York State

**Authors:** Margaret Gates Kuliszewski, Baozhen Qiao, Mandi Yu, Maria J. Schymura, Tabassum Insaf

PMC · DOI: 10.1007/s10552-026-02152-1 · 2026-03-19

## TL;DR

Living in disadvantaged areas is linked to worse breast cancer outcomes, and synthetic data can approximate real data for analysis.

## Contribution

The study demonstrates that synthetic census tract data can be used to analyze cancer outcomes without compromising privacy.

## Key findings

- Greater socioeconomic disadvantage is associated with more advanced breast cancer stages and poorer survival.
- Synthetic data results for California were consistent with actual data in direction and significance.
- Synthetic data overestimated associations with stage but underestimated those with grade and subtype.

## Abstract

Synthetic census tracts can allow for release of small area cancer data without compromising patient confidentiality. We used synthetic and actual census tract data for California and actual data for New York State (NYS) to examine associations of small area socioeconomic factors with breast cancer prognosis and outcomes and to evaluate results obtained from synthetic versus actual data.

We retrieved data on invasive, first primary breast cancers diagnosed between 2006 and 2017 in females ages ≥ 18 in California  (n = 237,156) or NYS (n = 149,789). We categorized into quintiles census tract-level exposures and used multivariable-adjusted multilevel logistic and Cox proportional hazards regression analyses to examine associations with stage, grade, subtype, and overall and cancer-specific survival. We conducted separate analyses for California and NYS and compared results from the two states and from synthetic and actual data for California.

Except for income inequality, greater disadvantage for each socioeconomic variable was statistically significantly associated with more advanced stage, higher grade, higher-risk subtypes, and poorer survival in both states. Synthetic and actual results for California were consistent in direction and statistical significance, but the synthetic data tended to overestimate associations with stage and underestimate associations with grade, subtype, and survival.

Our results indicate that residence in more disadvantaged census tracts is associated with poorer breast cancer prognosis and outcomes. Associations were similar across two large, diverse states, and synthetic results approximated actual results for California. Additional work is needed to improve early diagnosis, care, and outcomes for individuals with breast cancer in disadvantaged areas.

The online version contains supplementary material available at 10.1007/s10552-026-02152-1.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}, ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, NR4A1 (nuclear receptor subfamily 4 group A member 1) [NCBI Gene 3164] {aka GFRP1, HMR, N10, NAK-1, NGFIB, NP10}
- **Diseases:** death (MESH:D003643), negative (MESH:D064726), NYS (MESH:D007562), triple (MESH:C536008), Cancer (MESH:D009369), aggressiveness (MESH:D010554), breast cancer (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]

---
Source: https://tomesphere.com/paper/PMC13002732