# Choosing real-world data for clinical and epidemiological research: methodological lessons from NHIRD and TriNetX—A narrative review

**Authors:** Teng-Li Lin, Yi-Ju Chen, Chun-Ying Wu

PMC · DOI: 10.1080/07853890.2026.2616549 · Annals of Medicine · 2026-01-19

## TL;DR

This paper compares two real-world data sources, NHIRD and TriNetX, and offers strategies to improve the validity of studies using these databases.

## Contribution

The paper provides methodological insights on using NHIRD and TriNetX for real-world research, highlighting their unique features and mitigation strategies.

## Key findings

- NHIRD has minimal selection bias but limited clinical detail and update frequency.
- TriNetX offers diverse populations and real-time data but may have hospital-based selection bias.
- Using refined criteria and external data can improve study validity in real-world research.

## Abstract

Large-scale real-world data (RWD) are increasingly used in clinical and epidemiological research, although database-specific structures and limitations may affect study validity and applicability. The Taiwan National Health Insurance Research Database (NHIRD) and the TriNetX network are two widely used RWD sources. This review compares their key features, strengths, and limitations and discusses approaches to address methodological challenges in real-world studies.

The NHIRD comprises comprehensive, population-based, longitudinal claims data covering nearly the entire Taiwanese population. Its strengths include minimal selection bias and broad follow-up capacity. However, limitations include infrequent updates, limited clinical detail, and a Taiwan-specific context that may restrict generalizability. In contrast, TriNetX is a multinational federated network of electronic medical records from diverse healthcare systems, offering larger and more heterogeneous populations, richer clinical variables, and near real-time analytic capability, but with potential hospital-based selection bias and limited flexibility due to its fixed analytic interface. Representative studies published between 2010 and 2024 demonstrate the application of both databases across multiple medical disciplines. To mitigate data-related limitations, commonly used strategies include refined inclusion and exclusion criteria, proxy variables for unavailable measures, and triangulation with external datasets, which can strengthen study validity and interpretability.

NHIRD and TriNetX are complementary real-world data sources, each with distinct strengths and limitations. Aligning research objectives with database characteristics is essential for appropriate study design. Recognition of platform-specific trade-offs and application of targeted methodological strategies support the validity and generalizability of real-world evidence.

The Taiwan’s National Health Insurance Research Database (NHIRD) offers comprehensive, population-based longitudinal claims with minimal selection bias but limited update frequency and Taiwan-specific scope.The TriNetX network is a global, federated research platform that integrates diverse populations and rich clinical information, yet it faces potential hospital-based selection bias and constrained analytic flexibility.Practical strategies such as refined inclusion criteria, proxy variable use, and triangulation with external sources can help researchers address these limitations and strengthen the validity of real-world studies.

The Taiwan’s National Health Insurance Research Database (NHIRD) offers comprehensive, population-based longitudinal claims with minimal selection bias but limited update frequency and Taiwan-specific scope.

The TriNetX network is a global, federated research platform that integrates diverse populations and rich clinical information, yet it faces potential hospital-based selection bias and constrained analytic flexibility.

Practical strategies such as refined inclusion criteria, proxy variable use, and triangulation with external sources can help researchers address these limitations and strengthen the validity of real-world studies.

## Full-text entities

- **Diseases:** psoriasis (MESH:D011565), PsA (MESH:D015535), nicotine dependence (MESH:D014029), chronic musculoskeletal inflammation (MESH:D007249), Cancer (MESH:D009369), alcoholic liver disease (MESH:D008108), RWD (MESH:D016773), rare disease (MESH:D035583), Catastrophic Illness (MESH:D002388), death (MESH:D003643), COVID-19 (MESH:D000086382), rheumatic diseases (MESH:D012216)
- **Chemicals:** alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12821341/full.md

## References

77 references — full list in the complete paper: https://tomesphere.com/paper/PMC12821341/full.md

---
Source: https://tomesphere.com/paper/PMC12821341