Systematic Examination of Gene Expression and Proteomic Evidence Across Tissues Supports the Role of Mitochondrial Dysregulation in ME/CFS
Gregory R. Keele, Mike Enger, Quinn Barnette, Roman Ruiz-Esparza, Manuel Alvarado, Ravi Mathur, Jeran K. Stratford, Stephanie N. Giamberardino, Linda Morris Brown, Bradley T. Webb, Megan Ulmer Carnes

TL;DR
This study finds evidence that mitochondrial dysfunction may play a role in ME/CFS, a complex disease with no known cure.
Contribution
The study systematically integrates gene and protein expression data to highlight mitochondrial dysregulation as a potential mechanism in ME/CFS.
Findings
Mitochondrial genes MT-RNR1 and MT-RNR2 show lower expression in ME/CFS cases across two studies.
Approved compounds targeting ME/CFS-associated genes suggest potential therapeutic avenues.
Despite gene-level variability, mitochondrial dysfunction is consistently observed in ME/CFS cases.
Abstract
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a chronic, multisystem disease characterized by post-exertional malaise and persistent fatigue. The cause of ME/CFS is not well understood, and there are no established biomarkers or FDA-approved pharmacotherapies. The clinical heterogeneity of ME/CFS presents challenges to diagnosis and treatment and necessitates collaborative efforts to generate robust findings. This study leveraged gene and protein expression data from the mapMECFS data repository and the DecodeME Genome-Wide Association Study (GWAS) to assess consistent gene signatures across studies. The mitochondrial genes MT-RNR1 and MT-RNR2 exhibited lower expression in ME/CFS cases in two studies. Combining this with increased expression of mitochondrial genes in platelets from another study, this supports mitochondrial dysregulation as having a role in ME/CFS.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —National Institute of Neurological Disorders and Stroke of the National Institutes of Health
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFibromyalgia and Chronic Fatigue Syndrome Research · Genetic Neurodegenerative Diseases · Biochemical Acid Research Studies
1. Introduction
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex, multisystem disease that affects millions of people worldwide [1]. ME/CFS is characterized by persistent and unexplained fatigue, post-exertional malaise (PEM), cognitive impairment, pain, and immune dysregulation [2]. Although these represent the hallmark symptoms of ME/CFS, the disease presents with a highly heterogeneous clinical profile, with symptoms and disease progression varying substantially across individuals [3]. The etiology and pathophysiology of ME/CFS are still poorly understood, and there are no specific diagnostic tests or effective treatments currently available [2]. This unmet clinical need and disease heterogeneity present significant challenges to traditional drug discovery for ME/CFS.
Identifying biomarkers and molecular signatures of ME/CFS could enable earlier and more accurate diagnosis, reduce reliance on subjective symptom reporting, and help distinguish ME/CFS from overlapping conditions. Additionally, molecular signatures may provide insights into the underlying etiology and risk factors of the disease and reveal biological subtypes, allowing researchers to stratify patients for clinical trials and tailor therapies to underlying pathophysiology. To identify biomarkers and molecular signatures of ME/CFS, researchers have profiled multiple omics data types, including transcriptomics and proteomics [4,5,6,7]. The extent of replication across studies, the consistency of results, and whether any discrepancies can be attributed solely to study design are currently unknown. Thus, there is a need to systemically reexamine and compare available data.
Genome-Wide Association Studies (GWAS) have also attempted to link genetic variants, along with their associated genes and proteins, to ME/CFS with the goal of identifying new biomarkers. While early studies were underpowered, the recent DecodeME GWAS represented a cohort of 15,579 ME/CFS cases [8]. Incorporating genomic signals from well-powered GWAS like DecodeME with other omics data resources represents another opportunity to detect consistent biological patterns across ME/CFS studies.
Because omics data can provide insights into the underlying mechanisms of the disease and reveal potential therapeutic targets, they also represent a potential resource for drug repurposing analyses [9] to identify existing compounds that may be effective for treating ME/CFS and its associated symptoms [10]. Drug repurposing has been successfully applied in various fields for decades. For example, zidovudine (AZT) was originally synthesized and tested as an anti-cancer drug in the late 1960s but was later repurposed and approved as the first anti-HIV drug in 1987 [11]. Bringing a novel drug to market is estimated to take over a decade and cost several billion dollars, making drug repurposing an appealing, cost-effective alternative [12,13,14]. Advances in computational methods and the growing availability of open-access biomedical data have supported this by creating a favorable environment for identifying new applications for existing compounds [15]. Data repositories provide researchers with centralized platforms where high-dimensional datasets can be accessed, reused, and integrated across studies. By providing access to previously published data, open data repositories allow researchers to derive new insights without the need to generate new data.
Recognizing the need for a centralized resource to facilitate data reuse and integration across ME/CFS studies, the mapMECFS data repository was created to collate ME/CFS study data [16]. The mapMECFS houses a range of data types to support ME/CFS research including molecular data (e.g., gene expression, DNA methylation, microRNA, proteomics, metabolomics, and cytokine profiles) and clinical and phenotypic data (e.g., demographic information, patient-reported outcome measures, and physical and cognitive assessments) on ME/CFS cases and controls.
In this study, we (1) conduct a comprehensive review of available ME/CFS gene expression and proteomic studies to evaluate cross-study findings, (2) incorporate results from the DecodeME GWAS with the goal of identifying consistent signals, and (3) attempt to identify potential therapeutic candidates via drug repurposing. Recognizing the variability in analytic approaches and experimental design across studies, we applied a single, standardized bioinformatics pipeline across gene expression and proteomic datasets to maximize cross-study comparability.
2. Results
2.1. ME/CFS Differential Gene Expression and Proteomic Studies—Current State of the Field
We surveyed both mapMECFS and the broader ME/CFS literature for studies that investigated gene expression and protein profiles between cases and controls and found six studies representing 11 gene expression and 10 proteomics datasets (Supplementary Table S1). After filtering by inclusion criteria (e.g., >10 cases and no gene array data for gene expression), four bulk tissue gene expression datasets, one single-cell gene expression dataset, and four proteomic datasets from ME/CFS case/control cohorts were retained for analysis (Table 1). Expression data were generated from multiple tissues or cell types including PBMC [4,17,18], monocytes derived from PBMC [19], and muscle [4]. For protein expression, the datasets were generated from plasma [4,20], extracellular vesicles derived from plasma [21], and cerebral spinal fluid (CSF) [4]. The sample sizes for both expression and proteomics studies ranged from 21 to 67. See Table 1 for more details on these studies.
When the proportion of null effects (p_0_) was examined, the values ranged from 0.709 to 1 with three studies being indistinguishable from a null distribution. A variety of quality control, filtering, and analysis methods were used for the results reported in the primary publications. To reduce heterogeneity, we reanalyzed all datasets using a common analysis workflow (Supplementary Figure S1) (see Methods). We note that these data represent distinct studies with varying scientific questions and experimental conditions, and thus we adjusted our analysis for a specific dataset based on its experimental design (see study-specific analyses in Supplementary File S1 for full analysis code and results).
Across all data (bulk gene expression and proteomics), a total of 27,374 genes were observed with 18,476 being observed in at least two datasets. For gene expression (e.g., transcripts), 6720 genes were observed and tested across all four bulk gene expression datasets (Figure 1a). The number of proteins quantified across the four protein datasets ranged from 301 to 1281 for the Giloteaux et al. [21] plasma data and Walitt et al. [4] plasma and CSF data, respectively. This variation in number of proteins likely stems, at least in part, from the specific proteomics assays and processing of the data. No genes were observed and analyzed across all datasets.
Using an FDR threshold of 10%, bulk expression studies yielded between 9 and 246 significant genes (Table 1). We observed two DEGs that were consistent across multiple studies (Figure 1b). MT-RNR1 and MT-RNR2 encode for the 12S and 16S rRNA in the mitochondrial genome and showed lower expression in ME/CFS cases compared to controls across two studies, notably in PBMC [18] and monocytes derived from PBMC [19] (Figure 2). Both studies showed sex differences between cases and controls with the effect stronger in males. Using the lenient FDR < 30% threshold expands consistent DEGs by two genes: FABP5 had decreased expression in PBMC [18] and muscle [4] from ME/CFS cases and CDNF had increased protein expression in plasma [20] and transcript expression in muscle [4] from ME/CFS cases (Supplementary Figure S2). The full differential expression results are available in Supplementary File S2 as well as from mapMECFS (https://mapmecfs.org/group/keele-enger-systematic-examination-of-gene-expression-and-proteomics).
In contrast, there were no overlapping results across the proteomic datasets. Most (three of four) of the protein datasets showed no significant genes (FDR < 10%). One study, Germain et al. [20], had eight genes that met this threshold (Table 1). None of these overlapped those identified in the gene expression studies.
2.2. Differential Gene Expression Analysis of scRNA-Seq Pseudobulk Data
Next, we reanalyzed pseudobulk data derived from PBMC samples from 30 ME/CFS cases and 28 controls originally generated and presented by Vu et al. [5]. Consistent with the bulk data from different tissues, most genes were not observed in all clusters (cell types) (Supplementary Figure S3). Based on a lenient threshold (FDR < 30%), no DEGs were detected across multiple clusters (Figure 3a,b, Supplementary Figure S4). Notably, of the 28 clusters, platelets (cluster 19) had the highest number of DEGs with 17 and 282 genes at an FDR of 10% and 30%, respectively (Supplementary File S2). Eight of the 17 are upregulated in ME/CFS cases, and 7 (88%) of those represent genes encoded on the mitochondrial genome, similar to MT-RNR1 and MT-RNR2 identified in the bulk gene expression data. Note that MT-RNR1 and MT-RNR2 specifically were not available in the scRNA-seq generated pseudobulk data. Looking more closely at these genes in the other clusters, cases consistently showed increased expression for these genes when compared to controls. This is consistent with other studies reporting elevated mitochondrial gene expression [22] and hyperactivation of platelets [23] in ME/CFS cases.
2.3. Gene-Level Analysis of DecodeME GWAS
DecodeME [8] reported 29 tier 1 genes (Supplementary Figure S5a; Supplementary File S2), representing genes that fell within a genome-wide significant interval and had a posterior probability of colocalization with an eQTL from GTEx [24]. We also defined significantly associated genes based on a 10% FDR threshold with the MAGMA [25] gene-level test, resulting in 178 genes, 14 of which were also in the tier 1 genes (Supplementary Figure S5b; Supplementary File S2). None of the 193 genes from across the tier 1 or MAGMA 10% FDR sets were detected as DEGs.
2.4. Drug Repurposing Analysis
Next, the gene-centric drug repurposing tool, Realomics [26], was used on the sets of significant genes for each dataset to identify compounds with therapeutic potential for ME/CFS. The highest number of compounds were identified in the Walitt et al. [4] muscle tissue and DecodeME MAGMA-based results [8]. For the 246 DEGs identified in the Walitt et al. [4] muscle bulk gene expression data, 33 genes have at least one corresponding targeting compound in one of the databases (Figure 4a–c, Supplementary File S3). The list of candidate therapeutics represents 139 compounds overall, of which 107 are clinically approved agents, and of those, 76 have specificity ≥ 0.2 (Table 1). These compounds encompass a diverse range of treatments targeting energy metabolism, antiviral, anti-inflammatory, immunomodulatory, neurological, hormonal, and cardiac functions. Of particular interest are those that modulate biological processes thought to be dysregulated in ME/CFS, including (1) mitochondrial [27,28,29] and metabolic support [30,31] (e.g., thiamine); (2) immunomodulators [32,33,34] (e.g., dimethyl fumarate/diroximel fumarate and ruxolitinib); and (3) neuromodulator agents [35,36]: opipramol. This list highlights only a subset of the candidate therapeutics identified. However, these results should be interpreted with caution because they have not been evaluated for the treatment of ME/CFS and may have detrimental side effects.
Using the bulk gene expression and scRNA-seq data generated from PBMCs identified only three additional clinically approved compounds with a specificity ≥ 0.2, which did not overlap compounds identified in other sample types. The proteomic studies did not identify any additional compounds with these criteria (Table 1, Supplementary File S3).
Using gene-based analyses from DecodeME GWAS genes [8] (Figure 4d–f), 29 of the 178 genes with an FDR < 10% were associated with at least one compound in one of the databases, including 22 that were targeted by clinically approved compounds. A number of these genes are involved in energy metabolism, including Pyruvate Carboxylase (PC), which replenishes intermediates of the TCA cycle, and DARS2, which is essential for protein synthesis within mitochondria. Other implicated biological pathways are neurons and neuro-immunity (e.g., CACNA1E, GRIA1, NRXN1, KCNB1) and cellular maintenance and regulation (e.g., CLK2, MGMT, PEBP1). In total, there were 137 candidate therapeutics overall with 82 representing clinically approved agents and 31 with a specificity ≥ 0.2. When looking at the DecodeME GWAS tier 1 gene list, two additional compounds targeting the PTGIS gene and one additional compound targeting CA10 were identified, albeit each had a specificity of 0.25 or lower (Supplementary File S3). Interestingly, there was overlap between the GWAS and muscle gene expression candidate therapeutics, with two small molecules being identified in both (zoledronic acid: CHEMBL924 and incadronic acid: CHEMBL53950; Supplementary Figure S6). It is important to note that the direction of the effect is not accounted for when identifying these target compounds. Further investigation would be required to evaluate the therapeutic potential of identified compounds.
3. Discussion
The mapMECFS repository [16] was used to survey available data from studies that profiled gene and protein expression in ME/CFS to look for consistent evidence across studies and identify genes that could reveal insights about the fundamental biology and etiology of ME/CFS. Genes with consistent or converging evidence represent new avenues for potential treatment because existing gene–drug pairs can be identified from drug databases.
Gene and protein expression analysis have been used to compare the responses of individuals with ME/CFS and healthy controls to exercise stimuli (via cardiopulmonary exercise tests [CPET]) [5,18,21] via study designs with repeated measures at different timepoints relative to stimuli (e.g., before and after CPET). The data collected by these studies can be analyzed in various ways depending on the scientific question. Variation in study design, scientific questions, and technical features, combined with relatively modest sample sizes (n < 50), represent a significant challenge to detecting converging evidence across studies. Even with these caveats, this study has shown evidence of mitochondrial dysregulation in ME/CFS across studies and identified candidate drugs for repurposing using existing results.
The overlap of analyzed gene transcripts and proteins across studies is generally limited. This is not surprising given that it is technically challenging to quantify thousands of proteins in a sample. Though proteomics technology is making great strides, generally far fewer proteins are quantified in a proteomics study (hundreds to a few thousand) than a gene expression study (15 to 20 thousand). This implicitly results in many genes being observed in only a subset of the datasets. This has likely contributed to the lack of consistent signal observed within the field, along with heterogeneity in study design and methodology.
Ultimately, much larger omics studies of ME/CFS are needed to advance the field in terms of identifying reliable molecular signatures, illuminating the underlying etiology of the disorder, and ultimately developing therapeutic treatments. This is particularly important given the heterogeneity of ME/CFS, which potentially reflects different etiologies, subtypes, disease stages, or underlying mechanisms. Genetic studies of ME/CFS [37,38,39,40,41,42] have similarly been hampered by underpowered sample sizes (in the context of genome-wide association studies for a highly complex phenotype) to produce reliable genetic signatures. Large-scale biobanks, such as UK Biobank [43] and All of Us [44], continue to expand with data types relevant to ME/CFS, including omics (e.g., metabolomics and proteomics) and electronic health records. These powerful resources contain information on participants that span the heterogeneity of ME/CFS and are now beginning to be leveraged for studies on how genetics and molecular intermediates (i.e., biomarkers) influence ME/CFS [45].
Our study supports the hypothesis of mitochondrial dysregulation in individuals with ME/CFS. We note that the direction of the effect for mitochondrial gene differential expression differed between bulk tissue and scRNA clusters, with MT-RNR1 and MT-RNR2 expression decreased in ME/CFS cases compared to increased expression in platelets for the seven mitochondrial genes (MT-ATP6, MT-CO1, MT-CO2, MT-CO3, MT-CYB, MT-ND3, and MT-ND5). Nevertheless, these results implicate mitochondrial dysregulation as a potentially complex feature of ME/CFS observed across studies, which is consistent with other studies [27,28].
The findings for MT-RNR1 and MT-RNR2, which we reproduced from the initial report by Raijmakers et al. [19], are particularly robust because they were replicated in the PBMC data from Gamer et al. [18]. MT-RNR1 and MT-RNR2 were not originally reported by Gamer et al. [18], which was more focused on findings related to PEM. MT-RNR1 and MT-RNR2 are both genes found in the non-nuclear mitochondrial genome and encode the mitochondrial ribosomal RNA 12S and 16S subunits, respectively. The peptides encoded by MT-RNR1 and MT-RNR2 are also known as MOTS-C (mitochondrial open reading frame of the 12S rRNA-c) and humanin, respectively. There are numerous preclinical studies of MOTS-C for a wide range of proposed applications [46]. Notably in the context of ME/CFS, MOTS-C is upregulated in response to exercise [47], considered an exercise mimetic [48], and a potential performance enhancing drug [49]. However, it should be noted that the peptide MOTS-C is not FDA approved, there are no active MOTS-C clinical trials, and safety risks have been documented [50,51]. Similar to MOTS-C, humanin is upregulated in response to exercise [47] and has been investigated in a wide variety of preclinical studies in which mitochondrial dysfunction and energy regulation have been implicated including in Alzheimer’s disease and diabetes. Because preclinical studies involve treatment with these endogenous bio-identical peptides, human clinical trials using MOTS-C and humanin are likely feasible. This could potentially bypass the need to develop small molecules to modulate MT-RNR1 and MT-RNR2 activity or abundance in mitochondria. We note that modified peptides are a rapidly growing area of therapeutics, e.g., GLP-1 receptor agonists for obesity and diabetes [52].
Signatures of mitochondrial dysfunction have been noted in other infection-associated chronic illnesses (IACIs) [53], such as post-acute sequelae of COVID (PASC), i.e., long COVID, post-treatment Lyme disease syndrome (PTLDS), chronic Q fever fatigue syndrome, and post viral fatigue syndromes [54]. These signatures include elevated biomarkers of oxidative stress and mitochondrial damage in long COVID [55,56] and PTLDS [57]. These debilitating chronic disorders share an infection-associated origin as well as exhibit similar clinical features (e.g., fatigue and cognitive dysfunction). Further understanding of how mitochondrial dysfunction contributes to ME/CFS could also prove meaningful for other IACIs.
Another promising but underdeveloped avenue for ME/CFS research is drug repurposing, which is the process of finding new therapeutic uses for existing drugs that have already passed regulatory approval. Drug repurposing can accelerate the drug development process for ME/CFS and reduce the cost and risk of failure because the safety and pharmacokinetics of the repurposed drugs are already known. Compounds that target ME/CFS-associated genes that are identified through drug repurposing could also represent new information based on their molecular targets and mechanisms of action, which, along with the target genes, could provide insights into the pathophysiology of ME/CFS and further validate potential biomarkers.
Here we have conducted an analysis to identify potential new therapeutics for the treatment of ME/CFS. Across all studies, we identified 201 FDA-approved candidate compounds, of which 89 showed specificity of 0.2 or greater, revealing several potential therapeutics of interest. To date, ME/CFS drug repurposing investigations are limited. Jeffrey et al. [10] identified differentially expressed gene modules in PBMC which they then queried in PharmGKB [58]. Broadly, their approach is similar to ours, although we frame our analysis around single genes rather than pathways and we query four drug databases rather than one. Despite the tissues differing between studies (PBMC compared to muscle), we do observe some system-level overlap, most notably, compounds that affect immune and mitochondrial function.
Although it is not feasible to systemically assess or discuss the plausibility of the full catalog of results here, we highlight selected compounds that appear to have additional evidence supporting their relationship to ME/CFS biology, symptoms, or features. First, thiamine (Vitamin B1) is associated with the gene TPK1 (DEG p_adj_ = 0.093) and at high doses has shown promise in improving fatigue and cognitive symptoms [59], likely by boosting mitochondrial function [60]. Mitochondrial and metabolic support in ME/CFS through various supplements has previously been proposed [61,62]. We also identified 14 compounds associated with the gene ATP1A1 (DEG p_adj_ = 0.029), which encodes the alpha-1 subunit a Na+/K+-ATPase pump and plays a vital role in regulating blood pressure by controlling sodium and potassium ion movement. One notable compound associated with ATP1A1 is artemether, an antimalarial medication. Artemether is an artemisinin derivative, and another drug in this class, artesunate, has been investigated as a ME/CFS treatment [63]. Dimethyl fumarate and diroximel fumarate target the product of the KEAP1 gene (Kelch-like ECH-associated protein 1; DEG p_adj_ = 0.071) and have immunomodulatory and antioxidant effects that could address neuroinflammation and oxidative stress. Two compounds target the PLAUR gene (DEG p_adj_ = 0.039) including alteplase and ruxolitinib. Ruxolitinib is a JAK1/2 inhibitor, has anti-inflammatory effects, and has been proposed for ME/CFS-related immune dysregulation [64]. Five compounds target the gene EBP (DEG p_adj_ = 0.039) including buflomedil, clomiphene, opipramol, triparanol, and trifluperidol. Opipramol is a neuropsychiatric and neuromodulatory treatment which has anxiolytic and antidepressant properties that could benefit ME/CFS-related anxiety, sleep disturbances, and mood symptoms.
Although some of the identified compounds appear to be plausible treatments for ME/CFS, we emphasize that these findings are preliminary and should be interpreted with caution. Further research is necessary to ascertain the viability and efficacy of these options and would require rigorous clinical trials to determine their effectiveness in individuals with ME/CFS. Nevertheless, drug repurposing represents an important initial step towards identifying potential interventions to treat ME/CFS.
This study aimed to collect accessible gene expression and proteomic data for ME/CFS, process through a common pipeline, and search for converging evidence, but there are some notable limitations. First, gene expression data were limited to RNA-seq for technical reasons, and older array-based results were excluded. Second, FDR was used, except for with the DecodeME tier 1 results, instead of family-wise multiple testing correction such as Bonferroni. Although this strategy is statistically justified particularly for omics data with non-independent tests, some of the reported results will represent false positives. Therefore, some caution is warranted when interpreting the results. Third, metabolomic and cytokine data were not considered and should be investigated in future studies. Finally, although many candidate gene–drug pairs were identified, drug repurposing for ME/CFS faces several challenges, such as the lack of validated animal models or model systems for testing and screening potential drugs, and the difficulty of accessing and integrating heterogeneous and dispersed data sources.
4. Materials and Methods
4.1. Gene Expression and Proteomic Data Harmonization and Analysis Framework
First, we sought to identify, ingest, and harmonize available gene and protein expression datasets from ME/CFS case/control studies to enable integration into drug repurposing tools and databases to aid in therapeutic discovery. The search started with the data available in the mapMECFS data repository [16] (https://mapmecfs.org (accessed on 9 July 2025)). The mapMECFS is the largest repository of ME/CFS-specific data and enables data findability and reanalysis of published ME/CFS data and the potential to increase the effective sample size by utilizing multiple studies. The query was expanded to other potential sources of data, including systematic querying of PubMed and Gene Expression Omnibus [65] (Supplementary Table S1). The selection of gene expression studies was constrained to those that used RNA-seq based assays, excluding studies without available gene count data [66] or that used older microarray assay [10,67,68], to maximize the harmonization of analysis and results across studies. Only studies with 10 or more ME/CFS cases and that used large-scale assays were included (i.e., small studies and those focused on a small number of candidate genes were excluded [69,70]). A small number of proteomic studies were excluded for not having processed data available [7,71].
To improve comparability, we defined a consistent statistical framework to be used across the studies (Supplementary Figure S1). For gene expression studies, publicly available raw gene count data were obtained and used as input for differential gene expression analysis via the DESeq2 R package [72] (v1.46.0) to test for genes with expression levels that differed between cases and controls. Proteomics studies employed the aptamer-based SomaScan assay [73] and tandem mass tag mass spectrometry [74]. Differential protein expression analysis was performed by first log_10_-transforming protein intensities, which were then individually tested for differences between cases and controls through analysis of variance.
For studies with repeated measures, we kept only baseline measures (i.e., before PEM induction or an exercise challenge) to improve the comparability with case-control studies. Most of the studies included both females and males, which we adjusted for in subsequent analysis. We used two significance thresholds, including a 10% false discovery rate (FDR) to define higher confidence ME/CFS-associated genes and a lenient 30% FDR threshold coupled with detection across multiple studies. This approach would potentially allow us to identify additional genes with some support across multiple studies. We also estimated the proportion of null effects for each dataset (p_0_), representing the overall proportion of genes or proteins in the data with no difference between cases and controls, as a single summary statistic of the extent of signal in each dataset. This is necessary to evaluate whether the summary statistics, in aggregate from a given study, contain a mixture of true and null effects. Underpowered studies may not yield any significant results after applying family-wise multiple testing correction but a p_0_ < 1 supports that the study is appropriate for FDR analysis. Alternatively, if p_0_~1, this implies that the results are unlikely to be different from a null distribution and not merely underpowered.
Differing from the other RNA-seq-based studies, the data from Vu et al. [5] represent single-cell RNA-seq (scRNA-seq) from peripheral blood mononuclear cell (PBMC) samples from ME/CFS cases and controls. Seurat, an R package for analyzing single-cell data [75], was used by Vu et al. [5] to (1) perform cluster analysis on the single-cell-level data for dimension reduction, resulting in 29 clusters, (2) define marker genes to annotate clusters with specific cell types, and then (3) summarize as pseudobulk quantitative data. Using the pseudobulk data from Vu et al. [5], the 29 clusters were filtered to remove any with fewer than 10 cases, resulting in the exclusion of one cluster with two individuals. Using the remaining 28 clusters, we performed differential gene expression using the same pipeline as the bulk data (as described above).
4.2. Gene-Level Analysis of DecodeME GWAS Summary Statistics
We obtained the summary statistics from the DecodeME GWAS [8], specifically based on 15,579 ME/CFS cases and 259,909 UK Biobank population controls with European ancestry. We used the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) [76] (v1.5.2) web portal (https://fuma.ctglab.nl/) to perform a gene-level association analysis using Multi-Marker Analysis of Genomic Annotation (MAGMA) [25] (v1.08). A 10 Kbp window around each gene was used for assigning SNPs to genes. We used two definitions for ME/CFS-associated genes: (1) the tier 1 genes defined by the DecodeME GWAS original manuscript [8], which was based on co-localization analysis with GTEx [24] eQTL and (2) a 10% FDR threshold for the MAGMA results.
4.3. Drug Repurposing Analysis with Realomics
For the drug repurposing analysis, we used the Realomics tool [26]. Realomics accepts a user-supplied list of prioritized genes known to be associated with a phenotype and queries four drug databases (Pharos [77], Open Targets [78], Therapeutic Target Database [TTD] [79], and DrugBank [80]) to identify compounds known to target the priority genes. Briefly, we defined sets of ME/CFS-associated genes (mapped to GRCh38 Ensembl gene IDs) for each dataset using a 10% FDR threshold for gene or protein expression datasets and DecodeME MAGMA results, as well as DecodeME tier 1 genes. Each set of genes were input individually into Realomics [25], producing a dataset-specific set of compounds. Only compounds with ChEMBL IDs were retained as potential therapeutics of interest. We note that gene–compound annotations in databases represent a range of relationships, such as compounds that bind gene protein products. Here we do not attempt to filter gene–compound pairs based on relationship type because that level of information is highly variable across compounds and databases. Identified compounds were prioritized based on the number of ME/CFS-associated genes with which they were associated and their specificity to ME/CFS-associated genes. This specificity metric provides information about a compound’s potential for off-target effects with treatment. The relative specificity of a drug is defined as the number of target genes/total number of genes targeted by the drug. Determining a specificity threshold empirically is challenging, but the Realomics developers recommend a specificity filtering threshold of 0.2 as a starting point. As such, drugs with a relative specificity less than 0.2 were considered nonspecific because more than 80% of the gene targets were not considered ME/CFS-associated and thus less likely to be good repurposing candidates.
5. Conclusions
This study surveyed gene expression and proteomics studies, many of which are now available in the mapMECFS repository, to perform reanalysis under a standardized framework toward detecting consistent gene signatures across multiple studies. Although consistent gene signatures were limited at the level of individual genes (MT-RNR1 and MT-RNR2 had reduced expression in two studies), signs of mitochondrial dysregulation were observed more broadly. Drug repurposing analysis was performed on each study’s gene results, representing a potential avenue for discovering treatment options for ME/CFS. All analysis code and results data from all studies have been made available on mapMECFS (https://mapmecfs.org/group/keele-enger-systematic-examination-of-gene-expression-and-proteomics, accessed on 9 July 2025), which ensures reproducibility of the findings. This study demonstrates the value of these rich datasets as resources for secondary analyses.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Tschopp R. König R.S. Rejmer P. Paris D.H. Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS): A preliminary survey among patients in Switzerland Heliyon 20239 e 1559510.1016/j.heliyon.2023.e 1559537131449 PMC 10149204 · doi ↗ · pubmed ↗
- 2Grach S.L. Seltzer J. Chon T.Y. Ganesh R. Diagnosis and Management of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Mayo Clin. Proc.2023981544155110.1016/j.mayocp.2023.07.03237793728 · doi ↗ · pubmed ↗
- 3Unger E.R. Lin J.-M.S. Chen Y. Cornelius M.E. Helton B. Issa A.N. Bertolli J. Klimas N.G. Balbin E.G. Bateman L. Heterogeneity in Measures of Illness among Patients with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Is Not Explained by Clinical Practice: A Study in Seven U.S. Specialty Clinics J. Clin. Med.202413136910.3390/jcm 1305136938592199 PMC 10931716 · doi ↗ · pubmed ↗
- 4Walitt B. Singh K. La Munion S.R. Hallett M. Jacobson S. Chen K. Enose-Akahata Y. Apps R. Barb J.J. Bedard P. Deep phenotyping of post-infectious myalgic encephalomyelitis/chronic fatigue syndrome Nat. Commun.20241590710.1038/s 41467-024-45107-338383456 PMC 10881493 · doi ↗ · pubmed ↗
- 5Vu L.T. Ahmed F. Zhu H. Iu D.S.H. Fogarty E.A. Kwak Y. Chen W. Franconi C.J. Munn P.R. Tate A.E. Single-cell transcriptomics of the immune system in ME/CFS at baseline and following symptom provocation Cell Rep. Med.2024510137310.1016/j.xcrm.2023.10137338232699 PMC 10829790 · doi ↗ · pubmed ↗
- 6Glass K.A. Germain A. Huang Y.V. Hanson M.R. Urine Metabolomics Exposes Anomalous Recovery after Maximal Exertion in Female ME/CFS Patients Int. J. Mol. Sci.202324368510.3390/ijms 2404368536835097 PMC 9958671 · doi ↗ · pubmed ↗
- 7Milivojevic M. Che X. Bateman L. Cheng A. Garcia B.A. Hornig M. Huber M. Klimas N.G. Lee B. Lee H. Plasma proteomic profiling suggests an association between antigen driven clonal B cell expansion and ME/CFSP Lo S ONE 202015 e 023614810.1371/journal.pone.023614832692761 PMC 7373296 · doi ↗ · pubmed ↗
- 8Genetics Delivery Team Boutin T. Bretherick A.D. Dibble J.J. Ewaoluwagbemiga E. Northwood E. Samms G.L. Vitart V. Project and Cohort Delivery Team AlmelidØ. Initial findings from the Decode ME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndromemed Rxiv 202510.1101/2025.08.06.25333109 · doi ↗
