# Methodological approach to optimize a step-by-step deterministic linkage of SNDS data with a clinical database (FREGAT) of gastric/gastroesophageal junction adenocarcinoma in France: Pitfalls and learnings

**Authors:** Magali Laborey, Audrey Lajoinie, Jonatan Freilich, Emmanuelle Samalin, Olivier Bouché, Guillaume Piessen, Matthias Stoelzel, Andrew Chilelli

PMC · DOI: 10.1371/journal.pone.0333667 · 2025-11-07

## TL;DR

This paper describes a method to link two French health databases for gastric cancer patients, aiming to improve real-world data for research.

## Contribution

A deterministic linkage algorithm was developed to connect FREGAT and SNDS databases for gastric cancer epidemiology.

## Key findings

- 1385 out of 1617 FREGAT patients were successfully linked to the SNDS database.
- 83.7% of successfully linked patients were matched in the first part of the linkage process.

## Abstract

Survival rates in the European population with gastric and gastroesophageal junction (G/GEJ) adenocarcinoma remain low. Epidemiologic research is warranted to understand the population size, unmet need, and current treatment patterns of G/GEJ adenocarcinoma. The objective of this research was to develop an algorithm to link patients across the FRench EsoGAstric Tumours (FREGAT) and Système National des Données de Santé (SNDS) databases to develop a real-world dataset for G/GEJ adenocarcinoma.

A step-by-step, indirect, deterministic record linkage algorithm was developed to match patient records from the FREGAT and SNDS databases. Corresponding variables in each data source were matched at an individual level. Each step in the linkage process used a given scoring criterion; the linkage process proceeded until a unique pair of patient records had equal observations across the databases, at which time patient data were considered linked. Due to the large number of potential matches, the linkage process was performed in two parts: first, matching on the stratified population using individual corresponding variables, and second, by linking without any stratification. Descriptive and inferential statistics were used to assess validity of the linkage process. This study was approved by the National Expertise Committee (Ethical and Scientific Committee for Research, Studies and Evaluations in the Field of Health; 5758940) and the French Personal Data Protection Agency (CNIL; 92 1441/DR 2022 088).

Of 1617 patients included in the FREGAT database, 1385 (85.7%) were successfully linked to the SNDS database. A majority of the linked patients (1159 [83.7%] of 1385) were matched in the first part of the linkage process.

We established an algorithm that enabled linkage of the FREGAT and SNDS databases that may be applied to capture additional data related to G/GEJ adenocarcinoma in France.

## Full-text entities

- **Diseases:** gastric and gastroesophageal junction (G/GEJ) adenocarcinoma (MESH:D013274), G/GEJ adenocarcinoma (MESH:D000230)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12594410/full.md

---
Source: https://tomesphere.com/paper/PMC12594410