Importance of sample selection and generalisability in scientific methodology
Egemen ÜNAL, Şefik YURDAKUL, Okan MADEN

Abstract
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
Dear Editor,
A well-planned methodology in scientific research is essential both for obtaining accurate results and for adequately representing the target population. In studies aiming for generalisability, a representative and sufficiently large sample size is a fundamental prerequisite [1].
In cross-sectional studies, selecting a sample that accurately represents the population allows the findings to be generalised. If a nonprobabilistic sampling method is used, the sample may not accurately represent the population, making generalisation inappropriate; therefore, such studies should refrain from claiming that their findings are generalisable [2]. Studies assessing health conditions must use proper sampling to reflect the community’s health status accurately. Inadequate samples may distort the true prevalence and lead to incorrect decisions and interventions.
When calculating the sample size of population-based prevalence studies, referencing results from previous studies with similar characteristics or using the expected prevalence from a pilot study can guide appropriate sample size determination [3].
A common method involves using 384 as a minimum sample size, based on an assumed 50% frequency, regardless of population size. This approach results in identical sample sizes for vastly different populations-such as 10,000 and 10 million-which can lead to inaccurate selection, especially in large populations. To improve reliability in such studies, multi-stage and probability sampling should be employed, along with accurate estimation of intraclass correlation and design effect [4–6]. A well-known national example is the Turkey Household Health Survey 2023, which employed a stratified, multi-stage, and probability-based sampling method to generate representative data on the population’s health status and risk factors [7].
The method of participant selection is as important as the sample size in research. Inclusion criteria must be defined precisely and described clearly and understandably [8]. For example, in a study aiming to represent children under 5 years old living in Ankara, selecting participants from only one district or failing to detail the inclusion criteria in the methodology section may indicate selection bias [9]. Studies focusing on relatively small and hard-to-reach populations are often unreliable due to increased sampling error [4].
The primary concern in sampling for scientific research is the risk of bias resulting from inappropriate sample selection. For instance, in a prevalence study conducted in a village, if the minimum sample size is set at five hundred but only individuals who are encountered and agree to participate are included—without ensuring representativeness—others are denied an equal chance of participation. Consequently, the findings cannot be generalised to the entire village, and the true prevalence may be misrepresented. Accordingly, increasing attention is being paid to study design and potential sources of bias when evaluating published research [10].
In conclusion, methodology in scientific research should be carefully structured. In particular, sample selection is critical for the generalisability of findings; inadequate or flawed sampling may undermine the study’s internal and external validity. Therefore, researchers must follow a rigorous methodological framework throughout every stage of the study and report it clearly.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Conroy RM The RCSI Sample Size Handbook Dublin, Ireland RCSI 2021
- 2Celentano DD Szklo M Gordis Epidemiology 6th ed Philadelphia, PA, USA Elsevier 2019
- 3Pourhoseingholi MA Vahedi M Rahimzadeh M Sample size calculation in medical studies Gastroenterol Hepatol From Bed to Bench 2013 6 1 14 17 10.22037/ghfbb.v 6i 1.332 PMC 401749324834239 · doi ↗ · pubmed ↗
- 4Berndt AE Sampling Methods Journal of human lactation: official journal of International Lactation Consultant Association 2020 36 2 224 226 10.1177/0890334420906850 32155099 · doi ↗ · pubmed ↗
- 5Hayran O Özbek H Sağlık bilimlerinde araştırma ve istatistik yöntemler (SPSS uygulama örnekleri ile genişletilmiş 3. baskı) (in Turkish) 3rd ed İstanbul Nobel Tıp Kitapevleri 2021 1 327
- 6Carlin JB Hocking J Design of cross-sectional surveys using cluster sampling: an overview with Australian case studies Australian and New Zealand Journal of Public Health 1999 23 5 546 551 10.1111/j.1467-842X.1999.tb 01317.x 10575783 · doi ↗ · pubmed ↗
- 7Sağlık Bakanl TC ığı Halk Sağlığı Genel Müdürlüğü Türkiye Hanehalkı Sağlık Araştırması 2023 (in Turkish) Ankara T.C. Sağlık Bakanlığı 2024 [cited 2025 Jul 30]. Available from: https://hsgm.saglik.gov.tr/media/attachments/2025/05/12/turkiye-hanehalki-saglik-arastirmasi-2023.pdf
- 8Martínez-Mesa J González-Chica DA Duquia RP Bonamigo RR Bastos JL Sampling: how to select participants in my research study? Anais Brasileiros de Dermatologia 2016 91 3 326 330 10.1590/abd 1806-4841.20165254 27438200 PMC 4938277 · doi ↗ · pubmed ↗
