Genomic analysis of Plasmodium vivax field isolates circulating in sub-Saharan Africa
Isabelle Bouyssou, Lemu Golassa, Inès Vigan-Womas, Matthieu Schoenhals, Arsène Ratsimbasoa, Ali Ould Mohamed Salem Boukhary, Maria de Fátima Ferreira-da-Cruz, Sandrine Houzé, Laurence Ma, Feng Lu, Chetan Chitnis, Pascal Campagne, Didier Ménard

TL;DR
This study analyzes the genetic diversity of Plasmodium vivax in sub-Saharan Africa, identifying regional clusters and exploring possible adaptations to Duffy-negative hosts.
Contribution
The study provides the first detailed genomic analysis of P. vivax in sub-Saharan Africa, revealing distinct genetic clusters and laying the groundwork for understanding parasite adaptation.
Findings
The analysis identified four distinct geographic clusters of P. vivax populations in sub-Saharan Africa.
Despite limited data from Duffy-negative individuals, no clear evidence of selective pressure on invasion-related genes was found.
The study highlights the genetic diversity of P. vivax and its regional variation across Africa.
Abstract
Plasmodium vivax malaria is a major public health problem outside sub-Saharan Africa. However, an increasing number of P. vivax infections in Duffy-negative individuals has been reported across Africa in recent years, raising concerns that the parasites may have evolved alternative pathways to invade reticulocyte and overcome Duffy-negativity. Here, we investigated the global genetic structure and diversity of sub-Saharan African P. vivax populations, exploring possible molecular signatures of adaptation to Duffy-negative hosts. We analyzed 204 previously published P. vivax genome sequences from Africa, Southeast Asia, the Pacific Coral Triangle, and South America and generated whole-genome sequences of 133 P. vivax field isolates collected from 10 sub-Saharan African countries. Our analysis revealed four distinct geographic clusters, with clear contrasts between East/West Africa and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —https://doi.org/10.13039/501100001665Agence Nationale de la Recherche (French National Research Agency)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMalaria Research and Control · Aquaculture disease management and microbiota · Mosquito-borne diseases and control
Introduction
Plasmodium vivax malaria remains a major public health concern in South America, Southeast Asia, the Middle East, the Pacific Coral Triangle, and Eastern and Southern Africa^1^. Historically, the predominance of Duffy-negative human populations in sub-Saharan Africa has led to the belief that P. vivax transmission is limited in this region^2^. This paradigm stems from evidence that P. vivax merozoites rely on the interaction between the Duffy Antigen Receptor for Chemokines (DARC) and the P. vivax Duffy Binding Protein (PvDBP) to invade erythrocytes^3^. However, recent studies have reported cases of P. vivax infections in Duffy-negative individuals across Africa^4–14^, challenging the conventional understanding and raising questions about potential alternative invasion pathways.
Despite these intriguing observations, robust genomic data from Duffy-negative individuals remain limited due to the technical challenges of obtaining high-quality sequences from low-parasitemia infections. Consequently, the molecular mechanisms underlying P. vivax invasion into Duffy-negative reticulocytes remain poorly understood. Previous studies have described duplications of the pvdbp gene^15,16^ in regions such as Madagascar, where Duffy-positive and Duffy-negative populations coexist, and Southeast Asia^17,18^. Additional work has identified homologs such as the P. vivax Erythrocyte Binding Protein (PvEBP/PvDBP2)^19^, which preferentially binds immature (CD71^high^) reticulocytes in Duffy-positive individuals but shows minimal binding to Duffy-negative reticulocytes^20^. In parallel, other ligands like PvRBP2a and PvRBP2b have been identified as key players in reticulocyte recognition prior to invasion^21,22^. However, no conclusive evidence has demonstrated that these genes facilitate Duffy-independent invasion.
Here, we focus on analyzing the global genetic diversity and population structure of P. vivax across Africa and other endemic regions to better understand the selective pressures shaping parasite populations. We provide whole-genome sequences of 133 P. vivax field isolates from 10 African countries (Angola, Burundi, Comoros, Djibouti, Egypt, Eritrea, Ethiopia, Madagascar, Mauritania, and Sudan) using selective whole-genome amplification (sWGA) and next-generation sequencing. Our dataset was enriched with 204 publicly available sequences from Africa, Southeast Asia, the Pacific Coral Triangle, and South America^23^, for a total of 337 samples across 18 countries.
To ensure a robust analysis, we assessed within-host diversity by calculating Fws values to distinguish monoclonal infections from polyclonal infections and to estimate heterozygosity. Our study primarily explores the global genetic structure of P. vivax, identifies regions under selection, and examines population dynamics in areas with a high prevalence of Duffy-negativity. While the limited availability of high-quality Duffy-negative sequences prevents definitive conclusions on parasite adaptation, this work lays the foundation for future studies with larger and more balanced datasets.
Results
Samples
A total of 133 P. vivax field isolates were obtained between 2016 and 2021 from symptomatic patients originating from ten sub-Saharan African countries. Samples were collected locally at health centers in Ethiopia (N = 66), Madagascar (N = 27), Mauritania (N = 2), and Angola (N = 12). Additional DNA extracts were obtained from symptomatic travelers returning to France from the Comoros (N = 8), Mauritania (N = 5), Djibouti (N = 3), Sudan (N = 3), Madagascar (N = 2), Eritrea (N = 2), Ethiopia (N = 1), Burundi (N = 1), and Egypt (N = 1).
We amplified P. vivax DNA using sWGA to address the challenge of low parasitemia in clinical samples and successfully sequenced 72/133 P. vivax genomes using high-throughput sequencing. Among the successfully sequenced genomes, 18/72 (25%) were from homozygous Duffy-positive patients, 52/72 (72.3%) were from heterozygous Duffy-positive patients, and 2/72 (2.7%) had undetermined Duffy status. Despite our efforts, P. vivax genome sequences from 18 homozygous Duffy-negative patients could not be properly exploited due to insufficient parasitemia levels, which resulted in low DNA amplification and inadequate sequencing quality (see “Methods” for details). This limitation highlights the ongoing technical challenges in obtaining high-quality genomic data from Duffy-negative individuals, an issue that remains a bottleneck for understanding potential adaptations of P. vivax to these hosts.
To expand the scope of our analysis, we enriched our dataset with 204 publicly available P. vivax genome sequences from prior studies. These included sequences from Southeast Asia (Cambodia, Thailand, Vietnam), the Pacific Coral Triangle (Indonesia, Malaysia, Papua New Guinea), South America (Brazil, Colombia), and Africa (Ethiopia, Madagascar, Mauritania). Full details, including sample metadata and references, are provided in Table 1 and Supplementary Information (Table S1).Table 1. List of P. vivax field isolates sequenced in this studyCountrySampleDuffy genotype of patientSequenceAngolaANG71Duffy-negative (homozygous)Not interpretableAngolaANG83Duffy-negative (homozygous)Not interpretableAngolaANG168Duffy-negative (homozygous)Not interpretableAngolaANG172Duffy-negative (homozygous)Not interpretableAngolaANG189Duffy-negative (homozygous)Not interpretableAngolaANG192Duffy-negative (homozygous)Not interpretableAngolaANG193Duffy-negative (homozygous)Not interpretableAngolaANG201Duffy-negative (homozygous)Not interpretableAngolaANG207Duffy-negative (homozygous)Not interpretableAngolaANG209Duffy-negative (homozygous)Not interpretableAngolaANG210Duffy-negative (homozygous)Not interpretableAngolaANG212Duffy-negative (homozygous)Not interpretableBurundi1803016292Duffy-positive (homozygous)Not interpretableComoros711035003Duffy-positive (homozygous)InterpretableComoros10157Duffy-positive (homozygous)InterpretableComoros11315Duffy-positive (heterozygous)InterpretableComoros11729Duffy-positive (heterozygous)InterpretableComoros8330Duffy-positive (heterozygous)InterpretableComoros10555Duffy-positive (heterozygous)Not interpretableComoros11812Duffy-positive (heterozygous)Not interpretableComoros11994Duffy-positive (heterozygous)Not interpretableDjibouti1609055605Duffy-positive (homozygous)Not interpretableDjibouti1812066011Duffy-positive (homozygous)Not interpretableDjibouti1807003070Duffy-positive (homozygous)InterpretableEgypt1408007924Duffy-positive (heterozygous)Not interpretableEthiopia711390488Duffy-positive (homozygous)InterpretableEthiopiaA1001Duffy-negative (homozygous)Not interpretableEthiopiaA1002Duffy-positive (heterozygous)Not interpretableEthiopiaA1004Duffy-positive (heterozygous)InterpretableEthiopiaA1005Duffy-positive (heterozygous)InterpretableEthiopiaA1006Duffy-positive (heterozygous)Not interpretableEthiopiaAd8001Duffy-positive (heterozygous)InterpretableEthiopiaAm008Duffy-positive (homozygous)Not interpretableEthiopiaAm010Duffy-positive (heterozygous)InterpretableEthiopiaAm012Duffy-positive (heterozygous)Not interpretableEthiopiaAw004Duffy-positive (heterozygous)Not interpretableEthiopiaAw0005Duffy-positive (heterozygous)InterpretableEthiopiaAw0007Duffy-positive (heterozygous)InterpretableEthiopiaG6001Duffy-positive (heterozygous)Not interpretableEthiopiaG6003Duffy-positive (heterozygous)InterpretableEthiopiaG6005Duffy-positive (heterozygous)InterpretableEthiopiaG6006Duffy-positive (heterozygous)Not interpretableEthiopiaG6007Duffy-positive (heterozygous)Not interpretableEthiopiaH9002Duffy-positive (heterozygous)InterpretableEthiopiaH9003Duffy-positive (heterozygous)Not interpretableEthiopiaK101Duffy-positive (heterozygous)InterpretableEthiopiaK102Duffy-positive (heterozygous)Not interpretableEthiopiaK107Duffy-negative (homozygous)Not interpretableEthiopiaMC4005Duffy-positive (heterozygous)InterpretableEthiopiaMC4008Duffy-positive (heterozygous)Not interpretableEthiopiaMC4010Duffy-positive (heterozygous)Not interpretableEthiopiaMC4011Duffy-positive (heterozygous)InterpretableEthiopiaMC4012Duffy-positive (heterozygous)InterpretableEthiopiaMC4013Duffy-positive (heterozygous)Not interpretableEthiopiaMC4014Duffy-positive (heterozygous)Not interpretableEthiopiaMC4016Duffy-positive (heterozygous)InterpretableEthiopiaMC4018Duffy-positive (heterozygous)Not interpretableEthiopiaMC4019Duffy-positive (heterozygous)InterpretableEthiopiaMC4020Duffy-positive (homozygous)InterpretableEthiopiaMC4021Duffy-positive (heterozygous)InterpretableEthiopiaMC4022Duffy-positive (heterozygous)InterpretableEthiopiaMC4023Duffy-positive (homozygous)InterpretableEthiopiaMC4024Duffy-positive (heterozygous)InterpretableEthiopiaMC4027Duffy-positive (heterozygous)InterpretableEthiopiaMC4030Duffy-positive (heterozygous)Not interpretableEthiopiaMC4031Duffy-positive (homozygous)InterpretableEthiopiaMC4032Duffy-positive (homozygous)InterpretableEthiopiaMC4033Duffy-positive (heterozygous)InterpretableEthiopiaMC4034Duffy-positive (heterozygous)Not interpretableEthiopiaMC4035Duffy-positive (homozygous)Not interpretableEthiopiaMC4036Duffy-positive (homozygous)Not interpretableEthiopiaMC4037Duffy-positive (heterozygous)InterpretableEthiopiaMC4038Duffy-positive (heterozygous)InterpretableEthiopiaMC4040Not IdentifiedInterpretableEthiopiaMC4043Duffy-positive (homozygous)InterpretableEthiopiaMC4044Duffy-positive (heterozygous)Not interpretableEthiopiaMC4046Duffy-positive (homozygous)Not interpretableEthiopiaMC4047Duffy-positive (homozygous)InterpretableEthiopiaMC4048Duffy-positive (heterozygous)Not interpretableEthiopiaMC4049Duffy-positive (homozygous)InterpretableEthiopiaMC4050Duffy-positive (heterozygous)Not interpretableEthiopiaMC4052Duffy-positive (heterozygous)InterpretableEthiopiaMC4053Duffy-positive (homozygous)Not interpretableEthiopiaMJ0016Duffy-positive (heterozygous)InterpretableEthiopiaMJ0017Duffy-positive (heterozygous)InterpretableEthiopiaMJ0018Duffy-positive (heterozygous)InterpretableEthiopiaMJ7001Duffy-positive (heterozygous)InterpretableEthiopiaMJ7003Duffy-positive (heterozygous)InterpretableEthiopiaMJ7004Duffy-positive (heterozygous)InterpretableEthiopiaMJ7006Duffy-positive (heterozygous)Not interpretableEthiopiaMT0023Duffy-negative (homozygous)Not interpretableEthiopiaW5011Duffy-negative (homozygous)Not interpretableEritrea11302Duffy-positive (heterozygous)Not interpretableEritrea1801078268Duffy-positive (heterozygous)Not interpretableMauritania711M21044701Duffy-positive (homozygous)InterpretableMauritania713285008Duffy-positive (homozygous)InterpretableMauritania714035002Duffy-positive (homozygous)InterpretableMauritania714145012Duffy-positive (heterozygous)Not interpretableMauritania11034Duffy-positive (heterozygous)Not interpretableMauritaniaMAU24Duffy-negative (homozygous)Not interpretableMauritaniaMAUAT68Duffy-negative (homozygous)Not interpretableMadagascarMAE-CSB01-001Duffy-positive (homozygous)InterpretableMadagascarMAE-CSB01-002Duffy-positive (heterozygous)InterpretableMadagascarMAE-CSB02-001Duffy-positive (heterozygous)InterpretableMadagascarMAE-CSB02-002Duffy-positive (heterozygous)InterpretableMadagascarMAE-CSB02-003Duffy-positive (heterozygous)InterpretableMadagascarMAE-CSB03-002Duffy-positive (heterozygous)InterpretableMadagascarMAE-CSB03-003Duffy-positive (heterozygous)InterpretableMadagascarMAE-V01-001Duffy-positive (heterozygous)InterpretableMadagascarMAE-V01-002Not identifiedInterpretableMadagascarMDZ-V02-001Duffy-positive (homozygous)Not interpretableMadagascarMDZ-V02-006Duffy-positive (homozygous)Not interpretableMadagascarMDZ-V03-001Duffy-positive (homozygous)Not interpretableMadagascarMAE-V04-001Duffy-positive (heterozygous)InterpretableMadagascarMAE-V06-001Duffy-positive (heterozygous)InterpretableMadagascarMDZ-CSB01-001Duffy-positive (heterozygous)InterpretableMadagascarMDZ-CSB01-002Duffy-positive (heterozygous)InterpretableMadagascarMDZ-CSB01-003Duffy-positive (heterozygous)InterpretableMadagascarMDZ-CSB01-005Duffy-positive (heterozygous)Not interpretableMadagascarMDZ-V01-001Duffy-positive (heterozygous)InterpretableMadagascarMDZ-V02-001Duffy-positive (homozygous)InterpretableMadagascarMDZ-V02-002Duffy-positive (heterozygous)InterpretableMadagascarMDZ-V02-004Duffy-positive (heterozygous)InterpretableMadagascarMDZ-V02-005Duffy-positive (heterozygous)InterpretableMadagascarMDZ-V02-006Duffy-positive (homozygous)InterpretableMadagascarMDZ-V04-001Duffy-positive (heterozygous)InterpretableMadagascarMDZ-V04-002Duffy-positive (heterozygous)InterpretableMadagascarMDZ-V04-003Duffy-positive (heterozygous)InterpretableMadagascar711185003Duffy-positive (heterozygous)InterpretableMadagascar713455019Duffy-positive (homozygous)InterpretableSudan713405019Duffy-positive (homozygous)Not interpretableSudan713485020Duffy-positive (homozygous)Not interpretableSudan510m15000201Duffy-positive (heterozygous)Not interpretable
Genomic data
We aligned the sequencing reads to the PvP01 (v.48) reference genome using bwa mem and used GATK 4.0 (Genome Analysis Toolkit) following best practices guidelines to identify Single Nucleotide Polymorphisms (SNPs). Genotypes were defined in diploid mode. To ensure robust variant calling, we applied hard filtering based on several summary statistics. Variants were excluded if they met any of the following criteria: QD < 2.0, QUAL < 30.0, SOR > 3.0, FS > 60.0, MQ < 40.0, MQRankSum < −12.5, ReadPosRankSum < −8.0. Further details of these filters and an overview of the workflow are provided in Fig. S1.
The final dataset retained >300,000 SNPs across 276 samples (72 newly sequenced samples from this study and 204 sequences from previously published datasets). To assess the presence of polyclonal infections, we evaluated within-host diversity by calculating Fws coefficients^24^. Empirically, isolates with Fws < 0.95 are generally considered as polyclonal, while those with Fws > 0.95 are expected to be monoclonal. This analysis revealed that approximately one-third of the isolates (32.4%) could be classified as polyclonal (Tabel S2 and Fig. S2). This finding corroborates previous studies showing that P. vivax infections in Southeast Asia and South America are frequently characterized by multiple clones^8,25–28^. For subsequent analyses, polyclonal infections were excluded to avoid biases in heterozygosity, while complementary analyses including both monoclonal and polyclonal infections were conducted and reported in the Supplementary Information to assess the robustness of the results (Fig. S3).
Global genetic diversity
The population structure of P. vivax isolates was characterized by low levels of admixture overall, with clear geographic genetic clusters identified (Fig. 1A). Isolates from South America (Brazil and Colombia) exhibited more ambiguous clustering profiles, suggesting some degree of admixture that requires further investigation with a larger sample size.Fig. 1. Global genetic diversity of P. vivax populations.A Structural analysis of chromosomes. B Principal Coordinate analysis (PCO). C Genome-wide genetic differentiation with genomic islands.
The Principal Coordinate Analysis (PCO) corroborated the genetic clustering and revealed four distinct geographic groups (Fig. 1B): an East/West African cluster (Ethiopia, Djibouti, Mauritania), an Indian Ocean cluster (Madagascar, Comoros), an Asian/Pacific Coral Triangle cluster (Cambodia, Indonesia, Malaysia, Papua New Guinea, Thailand, and Vietnam), and a South American cluster (Brazil, Colombia), which appeared centrally positioned relative to the other three clusters.
Among the sub-Saharan African samples, we observed a notable distinction between the two African subpopulations: the East/West African cluster (Ethiopia, Djibouti, Mauritania) and the Indian Ocean cluster (Madagascar, Comoros). Notably, the Indian Ocean cluster was genetically closer to the East/West African cluster than to the Asian/Pacific cluster, suggesting significant historical or contemporary gene flow between these African populations.
This observation contrasts with the assumption of a unique Indonesian origin for the P. vivax population in Madagascar^29^. While past human migrations have been proposed as a major driver of P. vivax population structure^6,30^, our results suggest that the genetic landscape in Madagascar may reflect a more complex history, potentially involving multiple introductions and subsequent gene flow between African and Indian Ocean populations. These findings align partially with previous studies but highlight the need for larger sample sizes and deeper analyses to refine our understanding of the genetic origins and population dynamics of P. vivax in Madagascar and the surrounding regions.
Genomic islands of genetic differentiation
We investigated whether the low frequency of Duffy antigens in human populations across sub-Saharan Africa might exert selective pressure on P. vivax parasites, potentially favoring adaptations enabling Duffy-independent invasion pathways. To test this hypothesis, we performed a genome-wide analysis of genetic differentiation using pairwise comparisons of P. vivax sequences from multiple regions. We identified eight genomic regions with clear peaks of differentiation on chromosomes 4, 5, 7, 10, 11, 12, 13, and 14 (Fig. 1C). These regions contained genes with diverse functions, including loci involved in metabolic pathways, cellular components, and genes of unknown function (Table 2). Interestingly, we observed signals in three genes previously associated with antimalarial drug resistance: mdr-1 (PVP01_1010900), dhfr (PVP01_0526600), and dhps (PVP01_1429500). These findings likely reflect differential drug pressures acting on P. vivax populations worldwide, consistent with observations from other studies^25–27^. Importantly, no significant signals were detected in genes encoding parasite ligands directly implicated in erythrocyte invasion pathways, such as PvDBP or members of the PvRBP family, which would have suggested adaptations for Duffy-independent invasion. However, one notable exception was a signal detected in the thrombospondin-related anonymous protein gene (TRAP) (PVP01_1218700), located on chromosome 12. This protein is known to be expressed in P. vivax sporozoites and plays a critical role in the invasion of human hepatocytes^31,32^. While this signal may not directly relate to erythrocyte invasion, it highlights the potential involvement of P. vivax genes in other stages of the parasite’s lifecycle. Taken together, our results suggest that the global genomic structure of P. vivax populations reflects consistent patterns of differentiation, likely driven by geographic and environmental pressures, including drug pressure. Although no evidence was found for specific genetic adaptations to Duffy-independent invasion in the regions examined, these findings underscore the complexity of P. vivax evolution in regions where Duffy negativity predominates. Future studies incorporating larger datasets of high-quality genomes from Duffy-negative individuals will be essential to conclusively identify loci under selection for alternative invasion pathways.Table 2. List of genes falling within the identified genomic islands showing clear peaks of differentiation in eight genomic regions on chromosomes 4, 5, 7, 10, 11, 12, 13, and 14Chr.Pos. startPos. endIDDescriptionFunction4219558220885PVP01_0404900Plasmodium exported protein, unknown functionUnknown4399586401784PVP01_0409900acyl-CoA synthetase, putativeMetabolic pathwaysFatty acid biosynthesis and degradation510610191067947PVP01_0526300conserved Plasmodium protein, unknown functionUnknown510704131071498PVP01_0526400conserved Plasmodium protein, unknown functionUnknown510723101074121PVP01_0526500mRNA-binding protein PUF2, putativeMetabolic pathwaysRNA binding510773621079236PVP01_0526600bifunctional dihydrofolate reductase-thymidylate synthase, putativeMetabolic pathwaysFolate biosynthesis/Drug resistance510801181082726PVP01_0526700LETM1-like protein, putativeMetabolic pathwaysribosome binding510857141102054PVP01_0526800conserved Plasmodium protein, unknown functionUnknown7495347505297PVP01_0709800cysteine repeat modular protein 1, putativeCellular componentintegral component of membrane10470947474113PVP01_1010700heptatricopeptide repeat-containing protein, putativeUnknown10475543477329PVP01_1010800cytochrome b-c1 complex subunit 2, putativeMetabolic pathwaysprotein processing involved in protein targeting to mitochondrion10478739483133PVP01_1010900ABC transporter B family member 1, putativeMetabolic pathwaystransmembrane transport, food vacuole1013026791303648PVP01_103020060S ribosomal protein L31, putativeMetabolic pathwaystranslation11961821964757PVP01_1121700acetyl-CoA synthetase, putativeMetabolic pathwaysacetyl-CoA biosynthetic process12323534324835PVP01_12080006-cysteine protein P47Cellular componentcytoplasm, cell surface12768961770631PVP01_1218700thrombospondin-related anonymous protein, putativeMetabolic pathwaysentry into host, protein binding (sporozoite stages)12771355773349PVP01_1218800conserved Plasmodium protein, unknown functionUnknown13332331342481PVP01_1307300cysteine repeat modular protein 3, putativeUnknown1412259021230319PVP01_1428700conserved protein, unknown functionUnknown1412312841233767PVP01_1428800histone-arginine methyltransferase CARM1, putativeMetabolic pathwayshistone methylation1412352391237877PVP01_1428900conserved protein, unknown functionUnknown1412426351248727PVP01_1429000CCR4-associated factor 1, putativeMetabolic pathwaysnucleic acid binding1412544731258020PVP01_1429100ER membrane protein complex subunit 1, putativeMetabolic pathwaysprotein folding in endoplasmic reticulum1412593681261188PVP01_1429200mitochondrial carrier protein, putativeUnknown1412622071264847PVP01_1429300cullin-1, putativeMetabolic pathwaysubiquitin-dependent protein catabolic process1412674811268779PVP01_1429400conserved Plasmodium protein, unknown functionUnknown1412697561272304PVP01_1429500hydroxymethyldihydropterin pyrophosphokinase-dihydropteroate synthase, putativeMetabolic pathwaysFolate biosynthesis/Drug resistance1412731331274237PVP01_1429600conserved Plasmodium protein, unknown functionUnknown1412763171279088PVP01_1429700ATP-dependent RNA helicase DBP1, putativeMetabolic pathwaysnucleic acid binding1412848121287310PVP01_1429800protein phosphatase PPM7, putativeMetabolic pathwaysprotein dephosphorylation1412876831289728PVP01_1429900aquaporin, putativeMetabolic pathwaystransmembrane transport1412944831297856PVP01_1430000protein phosphatase PPM5, putativeMetabolic pathwaysprotein dephosphorylation1412990041301880PVP01_1430100ABC1 family, putativeMetabolic pathwaysPurine metabolism1413029631303546PVP01_1430200ribosomal protein L33, apicoplast, putativeMetabolic pathwaystranslation1413100451316657PVP01_1430400JmjC domain-containing protein, putativeMetabolic pathwaysBiosynthesis of secondary metabolites1413183051322540PVP01_1430500conserved Plasmodium protein, unknown functionUnknown1413236581326144PVP01_1430600RuvB-like helicase 1, putativeMetabolic pathwaysBiosynthesis of various antibiotics and secondary metabolites1413283281344272PVP01_1430700peptidase family C50, putativeMetabolic pathwaysproteolysis1426948422697905PVP01_1462600conserved Plasmodium protein, unknown functionUnknown
Genetic diversity of genes associated with drug resistance
In the African P. vivax population, we detected four single non-synonymous mutations (P33L, C49R, N130K, A255T) in the bifunctional dihydrofolate reductase-thymidylate synthase gene (PVP01_0526600) in samples from Madagascar, Comoros, and Mauritania. The C49R mutation, only found in Madagascar (18/23), was the most common mutation (26%). The P33L mutation was detected in samples from the Comoros (1/6), the N130K mutation in samples from Madagascar (3/23) and the A255T mutation in samples from Mauritania (1/3). No such mutations were found in samples from Ethiopia and Djibouti.
Analysis of mutation points in the hydroxymethyldihydropterin pyrophosphokinase-dihydropteroate synthase gene (PVP01_1429500), likely associated with sulfadoxine resistance, revealed the presence of five different non-synonymous mutations (E142G, M205I, G383A, I545T and A647V) in varying proportions. The three most common mutations were G383A (36/68) from 4 countries, followed by M205I (30/68) from 3 countries, and E142G (25/68) from 2 countries. The E142G, I545T and A647V mutations were country-specific: E142G in Ethiopia and Djibouti (24/34 and 1/1, respectively), I545T in Mauritania (1/3) and A647V in the Comoros (1/6) and Ethiopia (10/34). The M205I and G383A mutations were found in varying proportions in all countries except in samples from Madagascar in which we did not find M205I mutations. Three triple mutant alleles were observed including the E142G/M205I/G383A (15/68, mainly in Ethiopia 14/15), the E142G/M205I/A647V (10/68, only in Ethiopia 10/34) and the M205I/G383A/I545T (1/68, Mauritania). We also found two double mutant alleles, the M205I/G383A (4/68) and the G383A/A647V (1/68).
For the ABC transporter B family member 1 - multidrug resistance protein 1 gene (PVP01_1010900), which is thought to modulate parasite susceptibility to amino-4 and amino-alcohol quinolines, five-point mutations were found (F194Y, S698G, L845F, F976Y and T1269S). The F976Y mutation was the most common mutation (34/68) and was found in high proportions in Mauritania (3/3), in Ethiopia (30/34) and in Djibouti (1/1). The F194Y mutation was found only in samples from Madagascar (1/24), the S698G mutation was observed in samples from Mauritania (1/3) and Ethiopia (1/34) and the L845F mutation only in Mauritania (2/3). Two double mutant alleles were observed including the S698G/F976Y (7/68, mainly in Ethiopia 6/34) and the L845F/F976Y (2/68, Mauritania 2/3). More details are given in the Supplementary Information (Table S3).
Genetic diversity of invasion-related genes
We then restricted our analysis to regions containing genes validated or suspected to encode parasite ligands involved in reticulocyte invasion pathways^25–27,33^, such as the pvdbp, pvebp/pvdbp2, pvrbp2a, and pvrbp2b genes (Table S4). Only two mutations were found in the P. vivax Duffy binding gene (pvdbp, PVP01_0623800), the K277T (2/68) and the G830V (1/68). The K277T mutation was specific to samples from the Comoros (1/6) and Madagascar (1/24), while the G830V mutation was detected only in Mauritania (1/3). No mutations were detected in Ethiopia and Djibouti. Analysis of mutation points in the P. vivax erythrocyte binding gene (pvebp/pvdbp2, PVP01_0102300) revealed 7 non-synonymous mutations: D268N, E341K, K595N, I611F, E660K, V705L and F746I. The most common mutation was the I611F (8/68), found only in Comoros (1/6) and Madagascar (7/24). In Ethiopia, only the E660K mutation was detected at a low proportion (1/34), while no mutation was found in samples from Djibouti. The K595N mutation was specific for Madagascar (1/24) and the V705L mutation for the Comoros (1/6). The D268N mutation was found in Comoros (1/6), in Madagascar (1/24) and in Mauritania (1/3). Two double mutant alleles were found in Comoros and Madagascar (D268N/E341K) and in Mauritania (D268N/F746I).
No mutations were found in the reticulocyte binding protein 2a (pvrbp2a, PVP01_1402400) and the reticulocyte binding protein 2b (pvrbp2b, PVP01_0800700) genes in samples from Mauritania, Ethiopia, and Djibouti. In samples from the Comoros, the K112I mutation in the pvrbp2a gene and the S186N and D461G mutations in the pvrbp2b gene were detected once (1/6). The same mutations were found in samples from Madagascar, once for the K112I mutation in the pvrbp2a gene (1/24) and at higher frequencies for the S186N (6/24) and D461G (3/24) mutations in the pvrbp2b gene. The L84F in the pvrbp2b gene was unique to samples from Madagascar (5/24). More details can be found in the Supplementary Information (Table S4).
We also analyzed genetic differentiation by estimating Tajima’s D values at loci associated with invasion-related genes and comparing these values with those obtained for all individual genes across the genome. This analysis revealed that invasion-related genes from P. vivax populations in all regions fell well within the boundaries defined by other genes (Fig. 2A). In addition, a country-level comparison of Tajima’s D values across 13 populations confirmed that invasion-related genes did not exhibit patterns of differentiation distinct from the genome-wide average. Interestingly, values of Tajima’s D across all populations appeared shifted toward negative values, suggesting an excess of low-frequency alleles. This trend may indicate ongoing geographic differentiation at a finer scale, possibly driven by local demographic events, such as population expansions or recent bottlenecks, rather than by selective pressure on invasion-related loci.Fig. 2. Genetic diversity of invasion-related genes.A Differentiation of invasion-related genes in all P. vivax populations. B Analysis of Tajima’s D within all countries. C Analysis of Tajima’s D in Ethiopia, Madagascar, and Thailand.
Overall, we found no evidence of selective pressure specifically acting on invasion-related genes in sub-Saharan P. vivax strains (Fig. 2B, C). These results suggest that while geographic structure may influence allele frequency patterns, it does not appear to impose measurable selection on genes linked to erythrocyte invasion pathways.
Genetic diversity associated with Duffy genotype in human hosts
We hypothesized that the Duffy status of patients (homozygous and heterozygous Duffy-positives, and homozygous Duffy-negative) might influence the diversity and selection of invasion-related genes in African P. vivax genomes. To test this, we conducted a Genome-Wide Association Study (GWAS) on sequences obtained from 18 isolates infecting homozygous Duffy-positive patients and 52 isolates from heterozygous Duffy-positive patients. The results of the GWAS showed no significant associations between the patient’s Duffy status and parasite genotypes at loci encoding invasion-related genes. Specifically, we did not observe any molecular signatures suggestive of adaptive evolution in these genes that could facilitate infection in Duffy-negative hosts (Fig. 3). These findings suggest that the evolution of African P. vivax populations may not be primarily driven by diversification or selection of invasion-related genes, at least within the resolution of our current analysis. Alternatively, adaptations to infect Duffy-negative individuals, if they exist, may involve molecular mechanisms too complex or subtle to be detected by broad-scale genomic association methods, particularly given the constraints imposed by the limited availability of high-quality Duffy-negative sequences.Fig. 3GWAS analysis.A GWAS analysis of sequences obtained from 18 P. vivax African isolates infecting homozygous Duffy-positive patients and 52 P. vivax African isolates infecting heterozygous Duffy-positive patients for invasion-related genes. Manhattan plot displaying the genome-wide association results. Each dot represents a single nucleotide polymorphism (SNP), plotted as the negative log10-transformed p value (−log10(P))(−log_{10}(P))(−log10 (P)) of the association test. The x-axis represents genomic regions, labeled as PvP01_01_v1 through PvP01_14_v1, corresponding to different chromosomes or contigs of the Plasmodium vivax genome. The dashed horizontal line indicates the genome-wide significance threshold. No SNPs surpassed the genome-wide significance threshold, suggesting that no variants showed statistically significant associations after multiple testing correction. B Association of genetic variants in selected genes with the phenotype of interest. Genes are displayed on the x-axis, and the dashed horizontal line indicates the genome-wide significance threshold. Genes include AMA1, DBP, EBP, ETRAMP, GAMA, MSA180, MSP1, MSP1P, P12, RAMA, RBP2a, RBP2b, and RON. No SNPs reached the genome-wide significance threshold in this analysis.
Overall, while our data do not rule out the possibility of Duffy-independent invasion pathways, they underscore the need for future studies incorporating higher-resolution genomic data from larger numbers of Duffy-negative infections to identify potential loci under selection and fully elucidate the evolutionary dynamics of P. vivax populations in Africa.
Discussion
To date, the global genetic diversity and population structure of P. vivax remain poorly understood, particularly in Africa, where vivax malaria is rare and characterized by low parasitemia. While previous studies have demonstrated distinct population structures in Southeast Asia and South America, limited genomic data exist for P. vivax populations circulating in Africa, except for southern Ethiopia. Here, using genomic data from African and global isolates, we provide a comparative analysis of P. vivax populations and explore the potential impact of Duffy negativity on parasite evolution.
Our results confirm the clear geographic genetic structure of P. vivax, distinguishing African, Asian, and South American clusters, consistent with prior studies. Within Africa, we observed two distinct sub-clusters: the East/West African cluster (Ethiopia, Djibouti, Mauritania) and the Indian Ocean cluster (Madagascar and Comoros). Notably, the Indian Ocean cluster was genetically closer to East Africa than to Asia, challenging previous assumptions of a strictly Indonesian origin for the Malagasy P. vivax population. While earlier work suggested single origins, our findings, combined with the complex demographic history of Madagascar, indicate multiple introductions likely shaped by human migrations^6,30^.
Our findings align with previous observations reported by Hupalo et al.^26^, which demonstrated a clear geographic genetic structure of P. vivax populations worldwide, including low levels of admixture. Similarly, our analysis revealed well-defined clusters for African, Asian, and South American populations, with limited gene flow between regions. In addition, the work of Benanvente et al.^25^ identified loci under selective pressure, particularly in drug resistance genes (pvkelch10, pvmrp1), which differ from the loci identified in our study (pvmdr1, pvdhfr, pvdhps). This discrepancy likely reflects regional variations in drug use and the selective pressures acting on P. vivax populations in Africa compared to other regions.
A key hypothesis motivating this study was that Duffy-negativity might impose selective pressure on invasion-related genes, driving adaptations in P. vivax African populations. Genetic diversity within major invasion-related loci did not deviate significantly from the genome-wide background, and Tajima’s D analyses confirmed a lack of positive selection. For example, the pvdbp gene showed only two novel non-synonymous mutations (K277T and G830V), while mutations in pvebp/pvdbp2 and pvrbp2b were observed at low frequencies without clear evidence of selection.
As a result, potential adaptive signals may remain undetected, particularly if they involve subtle or complex evolutionary mechanisms.
Recent studies suggest that Duffy-negative erythroblasts transiently express DARC during terminal differentiation, providing a limited opportunity for P. vivax merozoites to invade^15,34^. This biological constraint could explain the low parasitemia consistently observed in Duffy-negative individuals and the apparent lack of strong selection signals in invasion-related genes. If alternative invasion pathways exist, they may rely on molecular mechanisms that are currently undetectable with our analytical approach or require larger, more balanced datasets for validation.
While no evidence of selection was observed in invasion-related genes, we identified strong signals of selection in genes associated with antimalarial drug resistance, including pvdhfr, pvdhps, and pvmdr1. These results are consistent with earlier studies, such as Benavente et al.^25^, which reported signals of antifolate resistance markers. However, our findings differ in specific loci under selection, as we detected positive selection in pvmdr1 (not observed by Benavente et al.), while signals in pvkelch10 and pvmrp1 were absent in our dataset^25^. These discrepancies likely reflect region-specific drug pressures and variations in antimalarial use across African and global populations. Notably, the detection of previously unreported mutations, such as I545T in pvdhps and F194Y in pvmdr1, highlights the ongoing evolution of drug resistance in African P. vivax populations.
Despite these advances, analysis of the P. vivax genome remains technically challenging due to field samples with low parasitemia and high human DNA contamination. While sWGA prior to WGS improved parasite genome recovery, it was insufficient to obtain high-quality sequences from Duffy-negative patients. Moreover, the sWGA approach precluded the detection of copy number variations in key invasion-related genes, such as pvdbp and pvebp/pvdbp2, which may play critical roles in alternative invasion pathways^35^.
Despite increasing evidence of P. vivax infections in Duffy-negative individuals, our study was unable to identify clear genomic signatures of parasite adaptation. This limitation primarily stems from the low parasitemia observed in these individuals, which restricted the availability of high-quality genomic sequences. As a result, potential adaptive mechanisms may remain undetected, particularly if they involve complex or subtle evolutionary processes. Overcoming these challenges will require larger, well-balanced datasets, improved parasite enrichment techniques, and functional studies to elucidate alternative invasion pathways.
In summary, this study provides new insights into the genetic diversity and structure of African P. vivax populations and places them within a global context. We confirm the distinct clustering of African P. vivax populations and highlight their divergence from Asian and South American populations. However, we found no evidence of selective pressure on invasion-related genes, suggesting that adaptations to Duffy-negative hosts, if they exist, may involve mechanisms too complex to be detected by our analysis. These results emphasize the biological and technical challenges of studying P. vivax in Duffy-negative individuals.
Methods
Biological samples
All biological samples used in this study were obtained from Plasmodium vivax-infected blood samples collected between 2016 and 2021 after obtaining informed consent from all participants. The study protocol and sample collection were approved by the Institutional Review Board of Addis Ababa University (Ethiopia), the Comité d’Éthique Biomédicale de Madagascar (Madagascar), the Comité National d’Éthique pour la Recherche en Santé de Mauritanie (Mauritania), and the Comité de Protection des Personnes Île-de-France (France) for samples from returning travelers. All ethical regulations relevant to human research participants were followed. Ethics approval was obtained from an ethics committee located in Angola for the sample collection that took place in Angola. Dried blood spots and/or veinous blood samples were collected from symptomatic patients at health centers in Ethiopia (Addis Ababa University), Madagascar (Institut Pasteur Madagascar), and Mauritania (Université de Nouakchott) and from symptomatic travelers returning from Comoros, Djibouti, Madagascar, Mauritania, and Ethiopia (French National Reference Center for malaria). Blood samples from Madagascar were leukodepleted using CF11-packed columns to minimize the amount of human DNA^36^. We enriched our dataset with 204 published genomic sequences of P. vivax strains circulating in Southeast Asia (Cambodia, Thailand, Vietnam), Pacific Coral Triangle (Indonesia, Malaysia, Papua New Guinea), South America (Brazil, Columbia), and Africa (Ethiopia, Madagascar, Mauritania) (Table 1 and Table S1).
DNA extraction
Genomic DNA was extracted from 100 µL of red blood cell pellets or from a 6-mm dried blood spot punch using the QIAamp mini-blood DNA kit (Qiagen), according to the supplier’s instructions. The total DNA concentration was quantified using a dsDNA HS assay kit and a Qubit fluorometer (Invitrogen).
Screening of Plasmodium species
The Plasmodium species screening of P. vivax isolates was performed by real-time PCR as previously described^37^ (Table S5).
Duffy genotyping
The Duffy genotype of the patients was determined by PCR and Sanger sequencing as previously described^6^ (Table S5).
Selective whole-genome amplification
sWGA was performed to enrich P. vivax DNA as previously described^35^ (Table S5). PCR products obtained from sWGA were diluted (1:1) with DNase-free and RNase-free water (ThermoFisher Scientific) and purified with AMPure XP beads (Beckman Coulter) according to the supplier’s recommendations.
Whole genome sequencing
Whole genome sequencing was performed as previously described^5^. Sample libraries were prepared using a Truseq Nano Kit (Illumina). Briefly, 100–150 ng of DNA was subjected to shearing, end repair, A-tailing, and adapter ligation. When necessary, they were enriched using 10–15 cycles of PCR. The sample libraries were then multiplexed and loaded. Paired-end sequencing was performed with 2 × 150 base reads kit on a NovaSeq6000. Illumina sequence reads were submitted to an archived system.
Variant calling
Sequencing reads were aligned to the Plasmodium vivax PvP01 reference genome (version 48, PlasmoDB), which is publicly available at https://plasmodb.org/plasmo/^38^.
Low-quality reads (MAPQ < 30), secondary alignments, and anomalous insert sizes (>1000 bp) were excluded. Duplicates were marked using Picard. SNPs were called with GATK HaplotypeCaller (v4.1.7.0) in diploid mode, and joint genotyping was performed using GenotypeGVCFs^39^. Variants were filtered using VariantFiltration with the following parameters: QUAL < 30, QD < 2.0, MQ < 40, MQRankSum <–12.5, SOR > 3.0, FS > 60.0, and ReadPosRankSum <–8.0. Genotypes with depth ≤4 were considered missing. SNPs with >10% missing data or not biallelic were excluded using VCFtools, and final VCFs were merged with Picard GatherVcfs. Additional technical details are provided in the Supplementary Methods.
Analysis of genotypic data
Infection complexity and filtering
To distinguish between mono- and polyclonal P. vivax infections, we computed the within-host diversity index Fws across loci using the formula Fws = 1 − H_w_/H_s_, where Hw is the observed heterozygosity within samples and Hs is the expected heterozygosity at the population level. Given the bias introduced by mixed infections in measures of diversity, structure, and genotype–phenotype associations, we restricted the main analyses to monoinfections. To evaluate the robustness of this choice, we conducted parallel analyses including all infections, whose results were consistent with those from the monoinfection subset (Fig. S3).
Population structure and genetic clustering
Genetic population structure was investigated using principal coordinate analysis (PCoA) based on pairwise genetic distances between samples, implemented in R (v4.3.0) with the ade4 package. Clustering analysis was performed using fastSTRUCTURE, a variational Bayesian algorithm, across a range of K values (K = 2–10)^40^. The most likely number of clusters was inferred using the chooseK function. These analyses were restricted to monoinfections. Further methodological details are provided in the Supplementary Methods.
Genomic differentiation and selection
Genomic islands of differentiation were identified by computing Nei’s coefficient of genetic differentiation (GST) for each SNP across populations. To explore selective signatures, we calculated Tajima’s D for all genes and focused on loci involved in reticulocyte invasion pathways. Gene-level estimates of genetic differentiation were also computed using GST. These analyses aimed to identify regions potentially under selection in association with host Duffy phenotype or geography.
Genotype–phenotype association analysis
To test for associations between parasite genotypes and host Duffy phenotype, we applied a generalized linear model (GLM) with a binomial distribution: Duffy_status = parasite_genotype + country_of_origin. P values associated with genotype effects were computed and corrected for multiple comparisons using the Bonferroni method. Only monoinfections with available Duffy genotyping were included in this analysis.
Statistics and reproducibility
This study is based on 133 P. vivax field isolates collected from ten sub-Saharan African countries and 204 publicly available genomes from other endemic regions. All analyses were conducted on monoinfections (as defined by Fws > 0.95), unless otherwise specified. Replicates correspond to independent patient-derived parasite isolates. No experimental replication was possible due to the observational and retrospective nature of the dataset. Statistical analyses were performed in R (v4.3.0), using appropriate packages and functions as described in the corresponding sections. Significance thresholds and multiple-testing corrections (Bonferroni) were applied where relevant. Genetic diversity, population structure, differentiation, and genotype–phenotype association were assessed with standard population genetics metrics and GLMs. Consistency of results was assessed by including and excluding polyclonal infections (Fig. S3). Full details of the analytical pipelines are provided in the Supplementary Methods.
Supplementary information
Transparent Peer Review file Supplementary Information reporting-summary
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1World Health Organization. World Malaria Report 2022 (ed. IGO. GWHOLCB-N-S) (World Health Organization, 2022).
- 2Ntumngia, F. B. et al. A novel erythrocyte binding protein of Plasmodium vivax suggests an alternate invasion pathway into Duffy-positive reticulocytes. m Bio 7, e 01261-16 (2016).10.1128/m Bio.01261-16PMC 499955327555313 · doi ↗ · pubmed ↗
- 3Goodman, S. M. & Benstead, J. P. (eds) The Natural History of Madagascar (University of Chicago Press, 2003).
- 4Cowell, A. N. et al. Selective whole-genome amplification is a robust method that enables scalable whole-genome sequencing of Plasmodium vivax from unprocessed clinical samples. m Bio 8, e 02257-16 (2017).10.1128/m Bio.02257-16PMC 529660428174312 · doi ↗ · pubmed ↗
- 5Van der Auwera G. & O’Connor B. D. Genomics in the Cloud (O’Reilly Media, Inc., 2020).
