Systematic Review of Mendelian Randomization Studies on Helicobacter pylori–Associated Health Outcomes
Ikuko Kato, Federico Canzian, Cosmeri Rizzato, Antonia Rodriguez, Javier Torres

TL;DR
This systematic review evaluates the quality and findings of Mendelian randomization studies on the causal effects of Helicobacter pylori infection on various health outcomes.
Contribution
The paper provides a comprehensive assessment of MR studies on H. pylori, highlighting methodological issues and suggesting future research directions.
Findings
Only 16 MR studies on H. pylori were identified, with most conducted in European populations.
Nine studies found causal associations, but many had methodological flaws and inconsistent results.
The review emphasizes the need for higher-quality MR studies in non-European populations.
Abstract
Helicobacter pylori (HP) infection has been linked to nearly 90 different health conditions, including gastric malignant and premalignant lesions. Recently, Mendelian randomization (MR) has gained popularity to overcome limitations in observational studies. This review aims to compile MR studies on the causal relationship between HP infection and health outcomes, systematically assess the quality of individual studies, evaluate the overall evidence in comparison with other existing data, and identify common strengths and weaknesses in order to guide future research directions on HP-associated health outcomes. Eligible studies were identified from the two major biomedical literature databases, PubMed and Embase. After removing overlaps, we found 33 unique records published by July 10, 2024. Among those, 16 were qualified for full-text review as original research papers presenting MR…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHelicobacter pylori-related gastroenterology studies · Genetic factors in colorectal cancer · Gastric Cancer Management and Outcomes
1. Introduction
Helicobacter pylori (HP) is a Gram-negative, microaerophilic bacterium of the phylum Campylobacterota/Epsilonproteobacteria that colonizes the human stomach, which was discovered by Marshall and Warren in 1983 [1]. It is estimated that roughly half of the world's population is infected with this bacterium [2, 3], making it one of the most common human infectious agents worldwide. A recent updated meta-analysis of 1748 studies from 111 countries confirmed the high global prevalence of HP at 44%, despite a declining trend [4]. The prevalence of HP varies geographically [4] and is influenced by socioeconomic status (SES), smoking, and diet [5–7].
Significant advances have been made over the past 40 years in understanding the pathophysiology of HP. It has been causally linked to the development of gastric adenocarcinoma, to mucosa-associated lymphoid tissue (MALT) lymphoma, and to other nonmalignant gastroduodenal conditions, such as peptic ulcer, chronic gastritis, and intestinal metaplasia [8, 9]. In 1994, the International Agency for Research on Cancer/World Health Organization (IARC/WHO), classified HP as a Class 1 carcinogen, the only bacterium with this classification [8]. Furthermore, HP infection has been linked to as many as 90 different health conditions, which include both protective and detrimental associations [10]. However, the vast majority of the meta-analyses were poorly executed, with only 6 health outcomes having moderate quality evidence [10], making their causal inference challenging.
Even high-quality, observational studies are prone to various biases, confounding factors, and reverse causation [11], while exposures are usually measured at only one time point. In order to overcome these limitations and to infer causality, Mendelian randomization (MR), an approach that uses genetic variants associated with changes in exposure as instrumental variables (IVs), has gained growing popularity. The genetic code is fixed at conception, randomly assigned from parents, and therefore, MR investigations can be interpreted as assessing the impact of long-term exposure and can address the causal association if properly executed [12].
A recent explosive increase in the number of MR studies is likely to be driven by widely implemented data sharing policies, the availability of many genome-wide association study (GWAS) summary statistics, and the development of bioinformatic tools. This has made the execution of two-sample MR studies, which are based on different datasets for the exposure and outcome, more accessible to nonstatisticians [13].
The exposure, HP infection, can be measured by several diagnostic tests [14] with different pros and cons. Serology to detect IgG antibodies against various HP antigens is minimally invasive and least labor-intensive and thus commonly used in large epidemiological studies; however, this test has a limitation in distinguishing past and active infection. Other methods, such as breath test, stool antigen test, and endoscopic biopsies, can be used to detect active infection and to monitor eradication [14], although they present varied sensitivity and specificity. Choosing the right diagnostic test is critical in MR studies where HP infection represents the exposure variable.
This review aims to compile MR studies on the causal relationship between HP infection and health outcomes; systematically assess the quality of individual studies; evaluate the overall evidence in comparison with other existing data, such as meta-analyses; and identify common strengths and weaknesses in order to guide future research directions on HP-associated health outcomes.
2. Methods
The two major biomedical publication databases, PubMed and Embase, were used to identify eligible studies. We applied the same search strategy to both databases, using “Mendelian randomization” AND “Helicobacter pylori” in any field. We did not use any filters to limit publications, and the search was limited to July 10, 2024. The inclusion criteria were original research, being an HP infection as the primary exposure of interest, and presenting the results from MR analysis. Both one- and two-sample studies were included. Preprints were not included. To select eligible studies, we reviewed the full text as well as all online materials. Screening and data collection were carried out by a single reviewer (I.K.) with consultation with another reviewer (A.R.). Additional information concerning individual IVs used in eligible studies was obtained by FUMA, an integrative web-based platform or functional annotation of GWAS results [15].
A total of 24 publications from PubMed and 29 publications from Embase were identified. Among these, 20 were overlapped, leaving 33 unique records. Among these 33, 17 were excluded for the following reasons: 6 were nonoriginal research (e.g., editorial, review, etc.); 9 did not use HP as main exposure, 1 was a non-MR analysis, and 1 was a preprint of one of the publications that was left in Embase. The remaining 16 were included in the full review. For each eligible publication, we assessed the type of study (one- vs. two-sample), definition of exposure, outcome characteristics, instrumental variant selection criteria, the three key assumptions for MR studies (relevance, independence, and exclusion) [13], statistical methods used, main results, and authors' conclusion. We did not review the reverse-association analysis when bidirectional MR [11] was performed in the eligible studies, because HP acquisition primarily occurs in early childhood [8, 9], and therefore, it is unlikely that, in adulthood, a health condition may cause HP infection.
3. Results
3.1. Study Design
Among the 16 MR studies identified, only one was a one-sample MR study [16], while the remaining 15 were two-sample MR studies. Nine of the 16 studies performed bidirectional analyses and six also examined potential mediators besides major health outcomes to test vertical pleiotropy, including closely related conditions (such as gastritis, ulcers, heart failure, and hypertension) and/or biomarkers (such as circulating lipids, glucose, and cytokines). All studies were conducted in individuals of European descent, except one consortium study on coronary heart disease (CHD) that included mixed races [17].
3.2. Exposure Definition
All 16 studies chose serology to assess exposure to HP infection and used one of the three published GWASs on HP serology (Table 1) or their combinations. Two of the studies were based on HP whole-cell antigens, while the other was based on 6 specific HP antigens. The first study published in JAMA [18] utilized a commercial enzyme-linked immunosorbent assay (ELISA) kit from Finland, Pyloriset EIA-G III, which employed HP whole-cell acid extract as the antigen. The manufacturer recommended a cutoff for seropositivity of 20 U/mL, with a reported 98% sensitivity and 58% specificity. However, to improve specificity and to reduce false positives, the authors revised the cutoff with the upper 25% of the IgG titer distribution of each cohort included in the study. Three population-based cohorts of middle-aged men and women from the Netherlands and Germany were used for the GWAS to identify genetic loci associated with HP infection, while one cohort was used for a validation study to test serology data against the HP stool antigen test. This validation study demonstrated a high correlation between serum IgG antibody and stool antigen levels (Spearman correlation 0.59). The main GWAS revealed two loci (p < 5 × 10^−8^) strongly associated with the binary status of seropositivity. One was mapped to chromosome (Chr) 4p14 near TLR10, and the other to Chr 1q23.3 near FCGR2A, with minor allele frequency (MAF) of 26% and 16%, respectively. The single nucleotide polymorphism (SNP) on chromosome 4p14 was also reported to be associated with differential gene expression of TLR1, which is located within the same locus as TLR10 [18]. It was also estimated that the two variants from this study explained 0.5% of HP seropositivity [16].
The study by Chong et al. also analyzed HP serology data obtained by in-house ELISA using commercially available HP whole-cell lysis antigens of strain 43504 from Australia (Meridian Life Science) [19]. This was part of a larger study to investigate the prevalence of antibodies against 12 common infectious agents and two caseins in children from 5 to 15 years old enrolled in the Avon Longitudinal Study of Parents and Children [20]. The data available for this GWAS were from 7-year-old children. The number of children tested for each antibody varied with the types of infectious agents, and there were 4683 children tested for HP [19]. The study did not find a clear cutoff to define seropositivity for HP, and thus, the data were treated as continuous values. Applying the p < 5 × 10^−8^ criterion, the authors found one variant mapping to an intergenic region of Chr 6 between HLA-DOB1 and MTCO3P.
Butler-Laporte et al. used a different approach to measure various HP IgG antibodies, i.e., multiplex Luminex assay, which was a GST capture ELISA combined with fluorescent bead technology [21]. This was part of a phenotype determination effort within UK Biobank concerning infection with 20 common microorganisms in subsamples [22]. HP antibodies against six different antigens, CagA, VacA, OMP, GroEL, Catalase, and UreA, were quantified. The cutoff value for seropositivity to each antibody was determined with the use of additional reference seronegative sera. For the GWAS summary dataset for 8735 individuals of European descent, seropositivity (binary) was defined as positivity to two or more HP antigens, except for CagA that is present only in Type I HP strains. Quantitative antibody data for the six individual HP antigens were included in GWAS summary data only if the values were above the seropositivity cutoff, since the values below the cutoff values were likely to represent nonspecific reactions. Thus, the sample sizes available for each antigen varied from 985 (CagA) to 2716 (GroEL). Applying the p < 5 × 10^−8^ criterion, the authors found two variants associated with lower seroreactivities to OMP and UreA. Both variants were mapped to Chr 6 close to HLADQB1 for OMP and RP11-439H9.1 for UreA.
3.3. Health Outcomes Studied (Table 2)
As shown in Table 2, a broad range of health outcomes were investigated in these 16 studies, which comprised four metabolic conditions (obesity [16], nonalcoholic fatty liver disease (NAFLD) [23], Type 2 diabetes (T2DM) [24], and osteoporosis [25]), four cardiovascular diseases (CHD [17], myocardial infarction (MI) [26], stroke [27], and atherosclerosis (AS) [28]), five gastrointestinal conditions (including colorectal cancer (CRC) [29], gastroesophageal reflux disease (GERD) [30], irritable bowel syndrome (IBS) [31], inflammatory bowel disease (IBD) [32], and eosinophilic esophagitis (EOE) [33]), and three others (pregnancy-associated complications [34], glaucoma [35], and IgA nephropathy (IgAN) [36]). Some of these outcomes were further divided into subclasses within the main outcome (e.g., type of CHD, stroke, AS, CRC, and glaucoma).
Most outcomes were binary traits (presence or absence), while a few related to obesity and pregnancy complications were continuous. Except for the one-sample study [16], the sources of these outcomes were either big consortia/cohorts/case–control studies of specific health outcomes or big biobanks from UK (UK Biobank) and Finland (FinnGen). About half of the studies (N = 7) included more than one source of GWAS data for the outcome, either combinations of data from two biobanks, consortia, big cohort, or case–control studies. The number of cases for the binary traits was generally over 1000, with few exceptions, i.e., cerebral AS (N = 150), placental abruption (N = 294), and rectal cancer (N = 375). Outcome definition varied widely. Consortium studies generally had strict inclusion criteria for specific medical conditions. For example, histological confirmation was required for NAFLD and EOE [23, 33] and combinations of symptoms and radiological or endoscopic or histological findings for IBD [32]. However, diagnostic criteria used in huge consortia that comprised a mix of prospective cohorts and cross-sectional/case–control studies, e.g., those for CHD [17, 26] and stroke [27], were variable depending on the protocols in individual studies. Outcomes from biobanks were generally derived from record linkage with various electronic health records, but the UK biobank offered several different sets of outcomes besides consolidated hospital discharge diagnoses. Specifically, self-reported history was used in the study by Wang et al. for IBS [31] and self-reported history of osteoporosis was used in the study by Zhang et al. [25]. Chen et al. used the secondary diagnostic codes of inpatient records to study GERD [30]. These authors, however, did not provide the rationale for choosing these specific datasets with less reliable diagnoses. Luo et al. did not specify what level of diagnosis within UK biobank was used in their study for CRC. The controls derived from the biobanks were not the same as in controls selected for typical case–control studies, but comprised the participants who did not develop specific outcomes of interest. Controls in consortium studies also varied from hospital, population, or friend/family controls to those from other consortia, depending on the choice of individual studies that participated. Most studies were conducted on adult patient populations, except for EOE that was limited to pediatric patients [33] and one of the IBD outcome cohorts that included both pediatric and adult-onset cases [32].
3.4. Instrumental Variant Selection Criteria
Instrumental variants were first selected based on p-values reported in the original GWAS for HP serology described above. The original selection criteria, p < 5 × 10^−8^, MAF > 1% and variants with the lowest P within the loci, were used in the studies that derived instrumental variants from the HP GWAS by Mayerle [18], as this GWAS was not in publicly available GWAS datasets. In MR studies based on two other HP serology GWASs [19, 22], the most common p-value cutoff was p < 5 × 10^−6^, while three studies adopted less stringent cutoffs, p < 5 × 10^−5^ [24, 34] or p < 1 × 10^−5^ [36]. Studies with the higher cutoff claimed to have removed variants with weaker association based on F-values, but there were no data reported as to how many SNPs were actually eliminated. Most of the studies also implemented removal of variants in linkage disequilibrium (LD), clumping variants with R^2^ = 0.001 threshold and 10,000 kb distance. Some studies further applied additional selection based on MAF (≥ 0.01) [34], or exclusion of ambiguous or palindromic SNP [29, 32, 34, 36], SNPs with intermediate frequencies [29, 32], outliers [28], and SNPs associated with potential confounders [27, 30] (Table 3). The ultimate numbers of SNPs used in the analyses were further limited by their availability in the outcome GWASs, ranging from 1 to 84 (Table 2). We compiled SNPs used in these 16 studies downloading also all online materials (if any). Table 4 presents the location and type of SNPs, nearest genes, and measures for potential functions for the 89 SNPs on 84 genomic loci with p < 5 × 10^−6^. There were 4 SNPs with p < 5 × 10^−8^, 10 with P between 5 × 10^−7^ and 10^−8^, and the rest with P ≥ 5 × 10^−7^. Surprisingly, there was no single overlap among instrumental SNPs selected based on different HP serological tests and none of the SNPs were in LD with any of the others in the list (r^2^ < 0.2).
3.5. Tests for the Key MR Assumptions and Statistical Methods
There are three key assumptions to fulfill to carry out valid MR studies. The first assumption concerns relevance that ensures variants are reliably associated with the exposure [13]. All studies demonstrated that the selected instrumental SNPs had sufficiently strong association with the HP serology measures based on F-statistic of 10 or greater, which has been generally accepted to rule out weak instruments [13]. However, there are marked discrepancies in the estimate of the strength of selected instruments due to the two different formulas to calculate F-statistics, one based on R-squares that indicates the proportion of the variance in the phenotype explained by specific variants [37], while the other is based on beta^2^/standard error (SE)^2^ in the exposure GWAS. For example, the SNP mapped to Chr 4p14 near TLR10 that showed the strongest association [18] was reported to have F-statistics of 548 [38], 478 [23], 236 [17], and 78 [24]. The first three values appear to be erroneous due to inappropriate calculation of R-squares from the logistic regression. Similar systematic discrepancies on F-statistic values have also been noted for instrumental variants selected from the Avon cohort [26, 34].
The second key assumption concerns independence that ensures no unmeasured confounders explaining the association between instrumental variants and outcome. To meet this assumption, three studies [29, 30, 36] implemented the removal of variants associated with potential confounders through search in PhenoScanner, a curated database of publicly available results from > 5000 large-scale GWAS to catalogue of human genotype–phenotype associations [12], and LDtrait, a web tool for finding germline variation associated with multiple traits. Details of search criteria or the number of variants removed were not reported in studies on GERD or CRC, but Jing described a search for variants associated with a wide range of potential confounders, e.g., age, weight, hypertension, proteinuria, dental caries, periodontitis, tonsillitis, and other infections, including respiratory pathogens (e.g., Mycoplasma pneumoniae, herpes virus, influenza) and gut microbiota [36]. Guo et al. described [27] the removal of variants with strong associations (undefined) with alcohol, smoking, body mass index (BMI), diabetes, and educational level. Again, there were no details concerning the definition of strong association or the number of variants removed. Zhu et al. implemented a UK Biobank GWAS data search for variants associated with confounders (education attainment, household income, and deprivation), which may affect HP acquisition. Instead of removing those variants from the main MR analysis, the authors executed multivariate MR analyses to control the effects of these confounders [33]. There was no information as to how many potential confounders were included in these statistical models.
The third key assumption is exclusion restriction that ensures that IVs are linked to the outcome only though the exposure of interest [13], ruling out horizontal pleiotropy. To meet this assumption, IVs should not be linked to the outcome through other independent biological mechanisms not on the direct vertical pathway of the exposure. Several statistical methods have been developed to detect horizontal pleiotropy. These include MR-Egger intercept, Cochran Q statistics, and MR-PRESSO global test [39–41]. The most commonly used were the first two, while some additionally or alternatively used MR-PRESSO (Table 3). Although no studies reported the presence of significant horizontal pleiotropy, some disregarded positive results from these tests. For example, in the MR study for T2DM, heterogeneity was present for anti-GroEL and anti-UreA antibodies (p < 0.01) [24]. In the study for CRC, the p-value from Q statistics for the association between colon cancer and seropositivity was 0.03 [29]. Furthermore, no studies reported whether they eliminated IVs that had a direct association with the outcome at the genome-wide level (p < 5 × 10^−8^) to fulfill this key assumption.
3.6. Main Results and Authors' Conclusion
Main causal estimates based on the inverse variance weighting (IVW) method were reported in all studies. As shown in Table 2, the summary odds ratios (ORs) for primary endpoints in individual studies clustered around 1.00, ranging from 0.325 to 1.23, except some skewed estimates (OR > 7) from studies with small case count, e.g., cerebral AS [28] or small size of outcome cohort, e.g., intracerebral hemorrhage [27]. Many of the statistically significant ORs showed an association of the magnitude close to 1.1, but the ORs that were almost 1.00, such as 1.001 (GERD), 1.002 (osteoporosis), and 1.03 (T2DM), were also found statistically significant owing to their large numbers of cases and controls included in the outcome GWAS. Nine out of the 16 studies reported statistically significant associations at the conventional level of p-values between selected IVs and the major endpoints (not including mediators), and the authors concluded that HP infection or specific types of antibodies against HP were causally associated with the primary diseases of interest. These include CHD [17], MI [26], osteoporosis [25], some of the pregnancy complications (preeclampsia–eclampsia and premature rupture of membranes) [34], T2DM [24], GERD [30], AS [28], stroke [27], and EOE [33]. The rest of 7 studies on CRC [29], IBS [31], IBD [32], obesity [16], glaucoma [35], NAFLD [23], and IgAN [36] found no association, and the majority concluded that there was no evidence or no genetic evidence for causal association. One study on IBD [32] referred to no direct correlation with HP infection, while the conclusion was more conservative in the IBS study, as mentioned “may not be causally associated” [31].
It is important to note that multiple primary outcomes (subtypes of diseases) and/or multiple sets of exposure IVs were included in the analysis in most studies and that the strength of the observed statistically significant associations at the conventional level was often insufficient to sustain through adjustment for multiple comparisons. Zhu et al. reported Benjamini–Hochberg corrected p-values for multiple testing in their study for EOE [33] and Li et al. presented Bonferroni-corrected p-value thresholds for their study on CHD [17]. 0.325 was the lowest reported OR for the association between EOE and HP seropositivity; however, this result was highly questionable because the risk estimates from individual IVs shown in their online data were not consistent with it [33]. In addition, the conclusion by Li et al. [17] that “HP infection exerts a causal effect on CHD incidence, mediated by BMI” does not seem well justified as the primary outcomes (except angina) did not show a significant association with IV, while BMI was strongly associated with IV.
4. Discussion
Prior to MR, evidence was present from meta-analyses to support potential associations between HP infection and health outcomes examined in the individual MR studies [10, 42, 43], except for IgAN. There were weak clinical data to support a potential causative role of HP in IgAN, while HP infection was inversely associated with the risk of end-stage renal disease in a meta-analysis [10]. The published meta-analyses generally supported the positive (detrimental) association between HP infection and the health outcomes, i.e., obesity, cardiovascular diseases, osteoporosis, T2DM, NAFLD, pregnancy complications, glaucoma, CRC, and IBS [10, 42–44], while meta-analyses for a minority of the outcomes suggested that HP might be protective against IBD, EOE, and GERD [10, 45]. The authors argued that despite the published data supporting the association, there were inconsistencies among observations, potential residual effects from confounding factors, which were not ruled out, and no clear biological mechanisms that explain the associations. That was the common rationale to conduct MR analyses.
There are several critical issues in interpreting the results of these MR studies. The major concern was the validity of IVs to determine exposure to HP. Two of the HP serology GWASs defined binary HP seropositivity status. The first study by Mayerle et al. defines seropositivity by the relative distribution of antibody titers, not using known negative samples and without consideration of cohort characteristics [18]. This would have led to reduced sensitivity and increased false negatives, particularly in older birth cohorts because they are more likely to have been exposed to HP [46, 47] and because antibody response declines with age. In the study by Butler–Laporte, seropositivity was determined by the number of positive antibodies against 5 HP antigens, excluding CagA [22]. Whereas the positive cutoff for each type of antibody was established using known seronegative samples [21], the accuracy of overall seropositivity based on the number of positive antibodies was not reported.
All six HP antigens to which antibodies were analyzed in UK biobank samples [22] are known HP virulence factors required for colonization of the stomach and have robust antigenicity [21]. However, four of these proteins are not HP-specific and can be produced by other bacteria, leading to possible cross reactivity in the assay (false-positive results). Importantly, because GWAS summary data for individual antibody levels from Luminex assays included only values above the positive cutoff, participants who were negative to specific antibodies were excluded. Consequently, these data do not help to address genetic substitutability to acquisition of HP, but may instead help to identify genetic variabilities in adaptive immunological responses in producing specific antibodies among those exposed to HP. On the other hand, IVs associated with seropositivity may reflect the probability of HP exposure, which can be affected by living conditions and SES, but also the innate immune response that acts as the first line of defense to foreign pathogens, as well as the specific adaptive immunity that leads to antibody production against pathogen proteins. This is consistent with the variant loci found in the study by Mayerle et al., i.e., TLR10/TLR1 and FCGR2A [18].
A more serious concern is the IVs from 7-year-old children in the Avon cohort, since the accuracy of serology for HP infection in young children (i.e., < 10 years), whose immune function is not fully developed, has been reported to be lower than other diagnostic tests [48, 49]. Because HP prevalence in children in high-income countries is estimated to be about 20% [50], a large portion of the continuous antibody data are likely to reflect unspecific reactions in HP-negative children. Moreover, the validity of 2-sample MR studies where exposure and outcome cohorts include individuals with different age ranges is questionable.
Serological tests to measure antibodies to HP antigens have been used in epidemiological studies under the premise that persistent antigen stimulation elicits an antibody response lasting for years. Accordingly, antibodies serve as markers for current or past HP infection. Thus, the statements suggesting that antibodies themselves are actually harmful and responsible for the causation of specific health conditions (Table 3) are misleading [27, 28, 33, 44]. An exception may be autoimmune mechanism, i.e., molecular mimicry, induced by HP infection [51]. Specifically, anti-CagA antibodies cross-react with human trophoblast cells, causing functional impairment and placental damage, which may increase the risk of preeclampsia [52]. In addition, anti-CagA antibodies have been demonstrated to interact with vascular antigens of 160 and 180 kDa, which are present in different parts of smooth muscle cells and endothelial cells of normal and atherosclerotic vessels [53]. Antibodies against HP urease have also been suggested to recognize the IKEDV motif in the CC chemokine receptor-like 1 (CCRL1), which complement-dependent tissue destruction and inflammatory response that contribute to AS and cardiovascular disease [51]. Likewise, antibody against HP GroEL (human Hsp60 homolog) has been postulated to promote CHD development through antigenic mimicry and complement-dependent cell damage, as the antiurease antibodies [51]. Among the 6 types of antibodies against specific HP antigens, VacA antibody was most often associated with health outcomes, i.e., coronary AS [28] and stroke [35]. It was not clear if this was due to generally larger number of IVs used for VacA compared with other HP antigens or due to higher antibody variabilities reflecting infection with HP VacA sequence variants.
Furthermore, it is imperative to note that some of the health outcomes (including potential mediators) were studied more than once with different sets of IVs from different GWASs. These include BMI, fasting blood sugar, and low-density lipoprotein (LDL) cholesterol and the results were highly inconsistent, with some showing opposite directions of the association (Table 2). This raises a serious concern about the reproducibility of these MR analyses.
Den Hollander reported that 0.5% of HP seropositivity was explained by the two most stringent variants found in their cohorts [16]. This fraction was indeed equivalent to those for other external risk factors that have been studied by MR, e.g., alcohol intake, smoking intensity, and coffee consumption (1% or lower), while those for host-related risk factors, such as height, bitter-taste sensitivity, and total cholesterol level, were much higher (13%–43%) [54]. As shown in Table 4, IVs selected by HP serology include a cluster of variants in the major histocompatibility complex (MHC) class 2 locus, which are also classified as gene expression quantitative trait loci (eQTL). While this locus plays a crucial role in antigen presentation and adaptive immunity, it is not clear if these associations were HP antigen-specific. In addition, while the selection criteria for the p-values were rather lenient, highly stringent criteria were used throughout these MR studies to exclude variants in LD. Thus, whereas this eliminates redundancy in IVs, it is possible that other potentially functionally important variants were not tested for the outcomes, because they were in LD with the SNPs selected as IVs, and therefore were discarded from analyses.
Most studies employed sets of bioinformatic analyses to detect horizontal pleiotropy, heterogeneity, and outliers for the exclusion restriction, as well as carried out sensitivity analysis using multiple statistical methods to summarize the effects of individual variants to test the robustness of the results. However, only a limited number of studies addressed the independence assumption concerning the association with potential confounders. Given a number of factors associated with HP acquisition [5–7] and shared risk factors among many chronic diseases, such as cardiovascular disease and cancer [55], this poses a serious concern in MR studies that do not test for this assumption. When it was tested, PhenoScanner [12] was the most commonly used tool to search variants associated with potential confounders. Confounders, by definition, must be correlated with both exposure and outcome; however, the definitions of potential confounders searched varied from study to study [27, 33, 36], and otherwise, no specifics were reported [29, 30].
A further disturbing finding was the lack of attention to the quality of outcome datasets. Implementation of data sharing policy has widened public access to large GWAS data, but the structure of those huge biobanks and consortia is often very complex and expertise/experience is required to retrieve the appropriate datasets for specific phenotypes. Moreover, in addition to the erroneous calculation of F-statistics discussed above, there were many seemingly erroneous data. For example, the source of the first HP GWAS by Mayerle et al. [18] was quoted wrongly as the GWAS dataset from Chong et al. [19] by two studies [17, 29]; alternatively, Chong's GWAS data were inaccurately described as the case–control data [23]; HP seropositivity determined by multiplex Luminex assays [22] was described as continuous data in one study [27]; IV selection criteria were described as p < 5 × 10^−8^ in the methods, but actual analyses included those with p < 5 × 10^−6^ [25]; a scatter plot was replaced by a Forest plot from completely different data in the study by Yang et al. [32]; and online data in the AS study listed a mixture of original beta and ORs for the OR [28]. Moreover, Guo et al. emphasized the importance of VacA-positive HP infection for stroke, despite the fact that VacA is present universally in HP [35]. These errors could have been prevented if the authors had implemented more rigorous research practices or included investigators with relevant expertise. Equally, this may represent failures in peer review, which often has to be completed within a short period of time. The quality of published work should be given more priority than accelerated publications.
The reported risk estimates from these MR studies, which represent an increase in outcome risk per allele, were mostly modest, although limitations in estimating the effect size in MR studies have been recognized [56]. Excluding the two studies discussed above for questionable conclusions [17, 33], the results from 6 other MR studies that yielded a significant positive association [24–26, 28, 34, 35] were consistent with those from prior meta-analyses in terms of the direction of the association. However, the magnitude of the associations was much weaker compared with those from meta-analyses of observational studies [10, 42, 43]. On the contrary, a positive association was found for GERD by MR analysis [30], while meta-analysis was in support for the inverse association [45]. Because the risk estimate from this MR study was negligible (OR = 1.001) with p-value of 0.043, it should rather be considered as null association. Biological and methodological knowledge about the relations between exposures and outcomes is critical for interpreting results of MR studies [13]. While negative results from MR studies may suggest residual confounding in traditional observational studies, given the small fraction of exposure explained by IVs, potential causal involvement should not be completely ruled out if strong biological data support the link. On the other hand, given imperfect knowledge about genetic pleiotropy and intrinsic uncertainty in IV assumptions [56], a positive MR result alone may not necessarily suffice to establish a causal association, even if rigorous assessment for MR key assumptions and a range of sensitivity analyses to confirm the robustness of the association were carried out. Some MR studies included potential mediators to shed light on mechanistic pathways. C-reactive protein (CRP), high-density lipoprotein (HDL) cholesterol, and blood glucose levels were suggested to be potential mediators for stroke [27], MI [26], and T2DM [24], respectively. While the proinflammatory property of HP was speculated to be the link between HP and these mediators for stroke and MI, HP-induced alterations in appetite/energy expenditure controlling hormones, leptin, and ghrelin [57], which are secreted from the gastric mucosa, have been proposed to be the underlying mechanism for insulin resistance.
Finally, despite the complete overlap of the eligible publications between the two databases searched, it is possible that we missed other studies published in journals not indexed in either database. Those reports are more likely to include studies with lower quality and unfunded studies, and thus, it is unlikely to change the conclusion of this review.
5. Conclusions
Overall, published MR studies concerning HP infection and various health outcomes to date are limited in their quality/integrity with erroneous data or interpretation, as well as in uncertainty in fulfillment of the key MR assumptions and heterogeneity in the results between and within studies on similar/subclasses of health outcomes. Since MR studies sit at the interface of intervention and observational studies [13], the results from HP eradication trials may help address the causality of some of the health outcomes with high incidence rates, while more reliable and comprehensive HP exposure GWAS data may help gain reproducible MR results. Finally, MR studies to date have been exclusively conducted in individuals of European descent, despite the fact that HP is more prevalent in non-white populations. Thus, MR studies in non-European populations are equally warranted.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Warren J. R. Marshall B. Unidentified Curved Bacilli on Gastric Epithelium in Active Chronic Gastritis Lancet 198318336127312756134060 · pubmed ↗
- 2Hooi J. K. Lai W. Y. Ng W. K. Global Prevalence of Helicobacter pylori Infection: Systematic Review and Meta-Analysis Gastroenterology 2017153242042910.1053/j.gastro.2017.04.0222-s 2.0-8502186238928456631 · doi ↗ · pubmed ↗
- 3Li Y. Choi H. Leung K. Jiang F. Graham D. Y. Leung W. K. Global Prevalence of Helicobacter pylori Infection Between 1980 and 2022: A Systematic Review and meta-analysis The Lancet Gastroenterology & Hepatology 20238655356410.1016/s 2468-1253(23)00070-537086739 · doi ↗ · pubmed ↗
- 4Chen Y. C. Malfertheiner P. Yu H. T. Global Prevalence of Helicobacter pylori Infection and Incidence of Gastric Cancer Between 1980 and 2022 Gastroenterology 2024166460561910.1053/j.gastro.2023.12.02238176660 · doi ↗ · pubmed ↗
- 5Wawro N. Amann U. Butt J. Helicobacter pylori Seropositivity: Prevalence, Associations, and the Impact on Incident Metabolic Diseases/Risk Factors in the Population-Based KORA Study Frontiers in Public Health 20197 p. 9610.3389/fpubh.2019.000962-s 2.0-85065127640 PMC 649166431069210 · doi ↗ · pubmed ↗
- 6Kotilea K. Bontems P. Touati E. Epidemiology, Diagnosis and Risk Factors of Helicobacter pylori Infection Advances in Experimental Medicine and Biology 20191149173310.1007/5584_2019_3572-s 2.0-8507400343331016621 · doi ↗ · pubmed ↗
- 7Peng L. Sun Y. Zhu Z. Li Y. Association of Oxidative Balance Score With Helicobacter pylori Infection and Mortality Among US Population European Journal of Nutrition 20246372499250910.1007/s 00394-024-03445-438847866 · doi ↗ · pubmed ↗
- 8IARC Schistosomes, Liver Flukes and Helicobacter pylori IARC Working Group on the Evaluation of Carcinogenic Risks to Humans. Lyon 1994 June 714PMC 76816217715068 · pubmed ↗
