Identifying the potential of sample overlap in evidence synthesis of observational studies
Zhentian Zhang, Tim Friede, Tim Mathes

TL;DR
This paper presents a set-theoretic method to identify and quantify sample overlap in observational studies, improving the reliability of evidence synthesis without needing individual participant data.
Contribution
The authors introduce a novel, practical set-based approach to detect sample overlap in evidence synthesis, addressing a key challenge in observational research integration.
Findings
Effective identification of sample overlap demonstrated on real-world data
Method provides overlap-free largest sample set for evidence synthesis
Highlights importance of addressing sample overlap in secondary data use
Abstract
Sample overlap is a common issue in evidence synthesis in the field of medical research, particularly when integrating findings from observational studies utilizing existing databases such as registries. Due to the general inaccessibility of unique identifiers for each observation, addressing sample overlap has been a complex problem, potentially biasing evidence synthesis outcomes and undermining their credibility. We developed a method to construct indicators for the degree of sample overlap in evidence synthesis of studies based on existing data. Our method is rooted in set theory and is based on the coding of the ranges of several well selected sample characteristics, offers a practical solution by focusing on making inference based on sample characteristics rather than on individual participant data. Useful information, such as the overlap-free sample set with the largest sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeta-analysis and systematic reviews · Health Policy Implementation Science · Biomedical Text Mining and Ontologies
