# On Cluster Structures of Finnish Cancer Incidence Data

**Authors:** Tommi Huhtinen, Milla Laurikkala, Sirpa Heinävaara, Teemu J. Murtola, Pauliina Ilmonen

PMC · DOI: 10.1177/10732748261419587 · Cancer Control: Journal of the Moffitt Cancer Center · 2026-03-07

## TL;DR

This study analyzes Finnish cancer incidence data from 1963 to 2023 to identify patterns and clusters, revealing how cancer trends differ by age, sex, and screening practices.

## Contribution

The study introduces a novel proximity measure for clustering cancer incidence trends, revealing distinct patterns influenced by screening and lifestyle factors.

## Key findings

- Cancers with national screening programs (e.g., breast, cervical) formed distinct clusters.
- Melanoma and lung cancer often separated into their own clusters, possibly due to lifestyle factors.
- The proposed proximity measure effectively identified cluster structures in incidence trends.

## Abstract

The global burden of cancer is increasing. Part of this development is attributable to the estimated growth and aging of the population. In particular, aging is 1 of the main risk factors for cancer. However, there are many other risk factors beyond aging, including certain lifestyle and environmental factors. In addition, changes in diagnostic thresholds, increasing coverage of screening, and other similar factors affect cancer incidence rates. Therefore, even after excluding the effect of aging of the population, cancer incidence rates have not remained constant over time. To study these changes, the focus of this study is to identify and analyze cluster structures of the Finnish cancer incidence data from 1963 to 2023.

To uncover the cluster structures, a proximity measure that is based on the shape of the curves is used. For unstandardized data, the proximity measure is shown to be invariant under simple location shift, and for standardized data, also under simple scaling, making the proximity measure suitable for assessing the similarities or dissimilarities of trends in time. As the group-building algorithm, agglomerative hierarchical clustering, combined with the average linkage method, is used.

The cluster structures were identified for 12 different subgroups, determined by age and sex. In many cases, cancers for which there exists a national screening program, including breast and cervical cancer, or an individualized testing tool, including prostate cancer, formed clusters of their own. Melanoma of the skin and lung & tracheal cancer are other 2 cancer types that often separated as their own clusters, possibly due to certain lifestyle factors.

The study demonstrates the potential of the proposed proximity in the given context. In addition, the analysis of the cluster structures provides some insight into the Finnish cancer epidemiology.

Introduction The global burden of cancer is increasing. Main reasons behind this are that the number of people is growing, and a larger amount of people are old. Still, after excluding these effects, cancer incidence rates have changed over time. To study these changes, the focus of this study is to find hidden structures of the Finnish cancer incidence data from 1963 to 2023. Methods A statistical method called clustering is applied for finding the hidden structures of the Finnish cancer incidence data. Results The cluster structures were identified for 12 different subgroups, determined by age and sex. In many cases, cancers for which there exists a national screening program, including breast and cervical cancer, or an individualized testing tool, including prostate cancer, formed clusters of their own. Melanoma of the skin and lung & tracheal cancerare other two cancer types that often separated as their own clusters, possibly due to certain lifestyle factors.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989), cervical cancer (MONDO:0002974), prostate cancer (MONDO:0005159), melanoma of the skin (MONDO:0005012)

## Full-text entities

- **Genes:** KLK3 (kallikrein related peptidase 3) [NCBI Gene 354] {aka APS, KLK2A1, PSA, hK3}
- **Diseases:** oesophagus, pancreatic, liver, colon, rectal, postmenopausal breast, endometrium, and kidney cancer (MESH:C537262), prostate and testicular cancer (MESH:D011471), glioma (MESH:D005910), colon, breast, and endometrial cancers (MESH:C537243), Melanoma of the skin (MESH:D008545), carcinoma in situ of the breast (MESH:D000071960), thyroid gland cancer (MESH:D013964), deaths (MESH:D003643), carcinogenic (MESH:D011230), Cervical cancer (MESH:D002583), Colon and corpus uteri cancer (MESH:D015179), pancreatic cancer (MESH:D010190), colon and lung &amp; tracheal cancer (MESH:D008175), Cancer (MESH:D009369), ORCID iDs (MESH:C535742), lung &amp; tracheal cancerare (MESH:D008476), corpus uteri (MESH:D002578), rectal &amp; rectosigmoid cancer (MESH:D012004), breast   and cervical cancer (MESH:D001943), obesity (MESH:D009765), testicular cancer (MESH:D013736), Hodgkin lymphoma (MESH:D006689), borderline tumor of the ovary (MESH:D010051), kidney cancer (MESH:D007680), basal cell carcinomas of the skin and (MESH:D002280), nervous system (MESH:D009422), skin squamous cell carcinoma (MESH:D002294), nerve sheath tumor (MESH:D018317), bladder &amp; urinary tract cancer (MESH:D001749)
- **Chemicals:** xenoestrogens (-), alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12967383/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12967383/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12967383/full.md

---
Source: https://tomesphere.com/paper/PMC12967383