Genome-wide analysis of genetic diversity in Anopheles darlingi from Rondônia State, Brazil
Sophie Moss, Holly Acford-Palmer, Alice O. Andrade, Emilia Manko, Jody Phelan, Matthew Higgins, Jansen F. Medeiros, Taane G. Clark, Maisa S. Araujo, Susana Campino

TL;DR
This study analyzes the genetic diversity and insecticide resistance markers in Anopheles darlingi mosquitoes from Brazil to better understand malaria transmission and control.
Contribution
The study provides the first whole-genome analysis of An. darlingi from Rondônia, identifying potential insecticide resistance genes and genetic differentiation between wild and colony populations.
Findings
Genetic differentiation was found between wild and colony An. darlingi populations.
Several SNPs were identified in genes associated with insecticide resistance, such as ace1, rdl, gste2, and vgsc.
Gene duplications in cytochrome P450 genes were observed, which may contribute to pyrethroid metabolism.
Abstract
Anopheles darlingi is the primary malaria vector in Central and South America and is responsible for most malaria transmission in the Amazon region. In this study, we perform whole-genome analysis of individual An. darlingi mosquitoes to explore genomic diversity, signatures of selection, and insecticide resistance markers. We analysed wild-caught (n = 20) and colony-maintained (n = 8) mosquitoes from the State of Rondônia, Brazil. In total, 1.54 million high-quality single-nucleotide polymorphisms (SNPs) were identified. Population genomic analysis revealed genetic differentiation between the colony and wild populations. No SNPs previously associated with insecticide resistance were detected. However, several SNPs were observed in four genes commonly associated with insecticide resistance: ace1, rdl, gste2, and vgsc. Genes under directional selection were identified, but no clear…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —https://doi.org/10.13039/501100000268RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMalaria Research and Control · Invertebrate Immune Response Mechanisms · Neurobiology and Insect Physiology Research
Introduction
Anopheles darlingi (or Nyssorhynchus darlingi^1^) is a major malaria vector throughout South and Central America^2^. This species is an efficient vector of both Plasmodium vivax and Plasmodium falciparum malaria due to its high susceptibility to Plasmodium infection, its ability to maintain transmission at low parasite densities^2^, and its propensity to feed on humans^3,4^. Furthermore, An. darlingi is highly adaptable and exhibits significant behavioural variation, including the ability to adapt to environments undergoing urbanisation and deforestation^2,5^. This species is particularly prominent in recently deforested areas, where its abundance surpasses that of other Anopheles species^6^. The ongoing anthropisation of these environments is thought to sustain malaria transmission by increasing the availability of larval habitats for Anopheles vectors^7,8^.
In 2023, an estimated 500,000 malaria cases were reported in the Americas, with Brazil accounting for the largest share (~163,000 cases), predominantly in the Amazon region^9^. Malaria cases in Brazil have increased from 138,000 in 2015, with P. vivax responsible for 72.1% of infections and P. falciparum for 16%^10^. Alarmingly, the proportion of P. falciparum cases has risen by 7%^11^, raising concerns due to its higher mortality rate, particularly among children and pregnant women. Strengthening malaria control in Brazil is crucial to curbing transmission and achieving both the country’s National Elimination Plan and the World Health Organization’s (WHO’s) goal of a 90% reduction in malaria incidence by 2030^12^.
Malaria control in Brazil primarily relies on indoor residual spraying (IRS) and insecticide-treated nets (ITNs)^13,14^. Pyrethroids are the most widely used insecticides against Anopheles mosquitoes, while carbamates and organophosphates are deployed for Aedes control in arbovirus programmes^15,16^. Systematic insecticide resistance surveillance is essential for informed vector control strategies, yet Brazil currently lacks a national screening programme for Anopheles mosquitoes. Resistance to pyrethroids, carbamates, and organochlorines has been documented in An. darlingi populations in Bolivia, Colombia, French Guiana, and Peru^17,18^. Insecticide resistance has not been reported in Brazil, which is likely due to limited historical surveillance^17^.
Insecticide resistance arises through multiple mechanisms, including target-site mutations, metabolic resistance, behavioural adaptations, cuticle thickening, and microbiome alterations^19,20^. Four key genes associated with insecticide resistance are the voltage-gated sodium channel gene (vgsc), the acetylcholinesterase gene (ace1), the glutathione S-transferase gene (gste2), and the gamma-aminobutyric acid (GABA) receptor (rdl – resistant to dieldrin gene). The vgsc gene encodes a transmembrane protein essential for nerve function, and is targeted by pyrethoid and dichlorodiphenyltrichloroethane (DDT) insecticides^21^. Whereas, organophosphate and carbamate insecticides work by inhibiting acetylcholinesterase (encoded by ace1), which is required for stopping synaptic transmission by hydrolyzing the acetylcholine neurotransmitter^22^. The gste2 gene encodes a metabolic enzyme able to metabolise insecticides^23^, and the rdl gene encodes a GABA subunit which is involved in nervous system signalling^24^. Target-site resistance is often driven by single-nucleotide polymorphisms (SNPs) in these genes. Well-characterised resistance-associated SNPs in the Anopheles gambiae sensu lato (s.l.) complex include V402L, L995F/S, I1527T, F1529C and N1570Y in the voltage-gated sodium channel (vgsc) linked to pyrethroid and DDT resistance, G280S in ace1 conferring organophosphate and carbamate resistance^25–27^, L119F in gste2 associated with DDT/pyrethroid resistance^23^, and A296G/S, V327I and T345S in the gamma-aminobutyric acid (GABA) receptor (rdl), which have been associated with resistance to dieldrin^28,29^. These SNPs may be identified rapidly using multiplex amplicon sequencing^30,31^. However, no SNPs associated with target-site resistance in other Anopheles species have been identified in An. darlingi to date^32–36^. Metabolic resistance, which results from differential gene expression enhancing detoxification enzyme activity^23,37–39^, is hypothesised to play a dominant role in insecticide resistance in An. darlingi, particularly given the absence of known target-site mutations. Increased metabolic resistance can arise from increased copy number of metabolic enzymes, resulting in increased gene expression. Three key metabolic enzyme families include the cytochrome P450s, esterases, and glutathione-S-transferases (GSTs)^40^. Copy number variation (CNVs) in cytochrome P450 genes is highly complex^41,42^, and pyrethroid resistance in Anopheles mosquitoes has been strongly associated with CNVs in cyp6aa1^43^, cyp6aap^44^, cyp6m2, cyp6p3, cyp6z1 and cyp9k1^45,46^, and with CNVs in the GST encoding gene gste2^47^. Furthermore, significant overexpression of genes in these metabolic families has been associated with insecticide resistance^48–50^.
Previous molecular studies of An. darlingi have focused on screening a limited number of loci in four candidate resistance genes (vgsc, ace1, gste2, rdl). However, genome-wide approaches are needed to identify novel resistance markers^30,36^. Whole-genome sequencing (WGS) has been extensively applied to An. gambiae s.l. mosquitoes to investigate population structure and resistance evolution^51,52^. Here, we apply WGS to two Brazilian An. darlingi populations in Rondônia State: a wild population from Candeias do Jamari, and a colony population from Porto Velho, both in the State of Rondônia and < 100 km apart from each other. By examining ancestry, population structure, and genetic regions under selection, we aimed to characterise genomic diversity and identify molecular markers potentially associated with insecticide resistance in this vector population, serving as a valuable resource for future research on vector adaptation and resistance mechanisms.
Results
Whole genome sequence data for An. darlingi
Twenty-eight An. darlingi mosquito samples were sequenced using WGS, yielding an average genome-wide coverage of 8.99-fold. The dataset comprised 20 wild-caught specimens from Candeias do Jamari (Rondônia) and 8 colony samples derived from mosquitoes originally maintained in Porto Velho (Rondônia) in 2018. After variant calling and quality filtering, 1,539,646 high-quality variants were retained for analysis, with 1,489,593 variants detected in the wild-caught samples and 535,667 in the colony samples. This discrepancy in variant number is likely due to the difference in sample size between these two populations.
An. darlingi colony samples are genetically distinct from wild populations
Admixture analysis identified two ancestral groups (K = 2), designated as K1 and K2, which differentiate the colony (n = 8) and wild (n = 20) Rondônia populations (Fig. 1). One colony sample (AnDar_1382) exhibited mixed ancestry from both K1 and K2, while three wild-caught samples primarily belonged to K2 but contained a small proportion of K1 ancestry.Fig. 1. Ancestry analysis of An. darlingi colony (n = 8) and wild (n = 20) mosquitoes.The number of ancestral populations was estimated to be 2, and these are denoted as K1 and K2.
Principal component analysis (PCA) reveals a clear genetic distinction between the colony and wild mosquitoes, consistent with the admixture analysis (Supplementary Fig. 1). As expected, chromosome-specific PCA highlights the separation between colony and wild mosquitoes across chromosomes 2, 3, and X (Supplementary Fig. 1). The mitochondrial PCA, and a maximum likelihood tree based on the mitochondrial genome, did not show clear distinction between the colony and wild mosquitoes (Supplementary Figs. 1 and 2).
Population differentiation due to genetic structure was measured using the Fixation index (F_ST_) statistic, where a value of 1 indicates complete differentiation between two populations, whereas a value of 0 indicates no differentiation^53^. F_ST_ analysis calculated in 1000 bp windows revealed allele frequency differences between the two populations. On chromosome 2, 349 SNPs had an F_ST_ ≥ 0.75, including 25 SNPs with an F_ST_ ≥ 0.9 and 3 SNPs with an F_ST_ of 1. On chromosome 3, 63 SNPs had an F_ST_ ≥ 0.75, with 8 of these reaching F_ST_ ≥ 0.9. Chromosome X had 3 SNPs with an F_ST_ ≥ 0.75, while no SNPs on the mitochondrial genome had F_ST_ ≥ 0.75 (F_ST_ on the mitochondrial genome ranged from 0.062 to 0.419). Genomic windows and genes with F_ST_ ≥ 0.9 across the genome are summarised in Table 1. Median genome-wide F_ST_ between the wild-caught and colony Rondônia An. darlingi populations was 0.143. This population differentiation is markedly high in comparison to other Anopheles species; low divergence is common in An. gambiae and An. coluzzii (F_ST_ < 0.05)^54^, and even F_ST_ computed between these two species has been identified as low (F_ST_ < 0.14)^51^. Similarly high An. darlingi F_ST_ values were identified in a larger study of 1,094 mosquitoes, where the median pairwise F_ST_ between populations in Brazil, Colombia, Guiana, Guyana, Peru, and Venezuela was 0.18^55^. These high An. darlingi F_ST_ values indicate increased genetic separation between populations, in comparison with African Anopheles species.Table 1. Genes within genomic windows (1000 bp) with F_ST_ scores ≥ 0.9Chr.Genomic Window Start PositionGenomic Window End PositionF_ST_ ScoreGene name (encoded protein)218740129187411280.930LOC125948827 (myb-like protein AA); ENSADAG00000001074 (ZNF281)218746129187471280.927LOC125948827 (myb-like protein AA); ENSADAG00000001074 (ZNF281)219057129190581280.901LOC125948822219774129197751280.954LOC125948886 (neuropeptide CCHamide-2 receptor-like)221131129211321280.911LOC125948306 (hemicentin-1); ENSADAG00000000808221176129211771280.924LOC125948306 (hemicentin-1); ENSADAG00000000808221397129213981281LOC125959064 (trithorax group protein osa)221655129216561281LOC125959087 (allatostatin-A receptor-like); ENSADAG00000005016222053129220541280.931LOC125948806; ENSADAG00000003076 (MIDEAS)223321129233221280.913LOC125950219223940129239411280.934LOC125951741 (mitochondrial import inner membrane translocase subunit TIM14); ENSADAG00000002236 (DNAJC19)224109129241101280.976LOC125952255 (uncharacterised)224381129243821280.914LOC125952268 (neuroligin-4, Y-linked)224659129246601280.923LOC125951300 (gamma-aminobutyric acid type B receptor subunit 2); ENSADAG00000000746225002129250031280.933LOC125947887 (5’-3’ exoribonuclease 1); ENSADAG00000002912 (XRN1)275896129758971280.904LOC125949681 (histone demethylase UTY); ENSADAG00000003908 (MAML1)281022129810231280.923LOC125949814; ENSADAG000000023813485359648545950.933LOC125956255 (teneurin-m)3965359696545950.913LOC125954234 (neuroglobin-like)369666596696675950.923LOC125956843 (uncharacterised)370008596700095950.904LOC125956845 (diacylglycerol kinase 1)370347596703485950.909LOC125955750 (rho GTPase-activating protein gacF); ENSADAG00000009704 (plppr4b)An. darlingi colony (n = 8) and wild (n = 20) mosquitoes.
Mutations in genes associated with insecticide resistance
SNPs were identified in four key genes associated with insecticide resistance in other vector species (ace-1, gaba, GSTe2, and vgsc) (Supplementary Table 1). Seven nonsynonymous SNPs were detected in these four genes (Table 2). However, none of the resistance mutations that have previously been associated with insecticide resistance in Anopheles species were detected. An additional 129 potential genes of interest were investigated, including those where reported SNPs have been associated with reduced insecticide efficacy. These included vgsc, ace-1, all gaba genes, cytochrome P450s, carboxylesterases, glutathione transferases, and glucuronyltransferases. In total, 442 nonsynonymous SNPs were identified across these genes (Supplementary Data 1**)**. The T283K mutation in the ADAR2_008159 gene (cyp6aa1 in An. funestus) has been identified as possibly linked to deltamethrin resistance in An. darlingi^55^. This mutation was not identified in the Rondônia samples. However, two other non-synonymous mutations were identified in this gene: I370V at 43.8% allelic frequency in the wild population and 31.3% in the colony population, and E359Q, which was found at 8.8% allelic frequency in the wild population and was absent in the colony population.Table 2. Non-synonymous mutations identified in the four key insecticide resistance genesGeneChrPositionRefAltCodon Change in An. darlingiCodon in An. gambiae PESTCodon in Musca domesticaAlternate Allele Frequency %Wild-caught (n = 20)Colony (n = 8)ace1215676267TCVal32Ala^a^Ala27^b^-56.750.0gste2289825712AGAsn82Ser^c^Asp82^d^Asp82^i^2.940gste2289825747GAAla94Thr^c^Ala94^d^Ala94^i^13.30gste2289825962GAGlu142Lys^c^Glu142^d^Glu142^i^2.940gste2289827309GCVal276Leu^c^Val57^d^Thr57^i^83.350.0vgsc335318429CTPro523Leu^e^Pro509^f^Pro529 ^j^2.940rdl353427108GASer90Asn^g^Ser90^h^Ser93^k^3.330An. darlingi colony (n = 8) and wild (n = 20) mosquitoes.^a^Codon numbering according to Anopheles darlingi transcript XM_049679441.^b^Codon numbering according to Anopheles gambiae transcript XM_061649305.^c^Codon numbering according to Anopheles darlingi transcript XM_049692446.^d^Codon numbering according to Anopheles gambiaetranscript AGAP009194-PA.^e^Codon numbering according to Anopheles darlingi transcript XM_049687134.^f^Codon numbering according to Anopheles gambiae transcript AGAP004707-PB.^g^Codon numbering according to Anopheles darlingi transcript XM_049684483.^h^Codon numbering according to Anopheles gambiae transcript AGAP006028-PB.^I^Codon numbering according to Musca domestica transcript XP_058986999.^j^Codon numbering according to Musca domestica transcript ARX79626.^k^Codon numbering according to Musca domestica transcript NP_001292048.1.
Nucleotide diversity
Nucleotide diversity (π) is the average number of nucleotide differences per site, between two populations^56^. π was calculated to estimate the level of genetic variation within the wild mosquito population, and was computed in 100 bp windows across the genome. The average nucleotide diversity among these 20 An. darlingi was π = 0.002, which is lower than has previously been found in An. gambiae s.l. populations from 15 locations across Africa (averaging at π = 0.015)^51^, and likely reflects this small sample size. Nucleotide diversity in key insecticide resistance genes was compared between the wild-caught An. darlingi (n = 20) and a reference dataset of wild-caught An. gambiae sensu stricto (s.s.) samples (n = 42) from Bubaque Island in the Bijagós Archipelago, Guinea-Bissau, West Africa (Table 3)^57^. This reference dataset was selected for comparison as both are wild Anopheles populations caught in a small geographic area. Nucleotide diversity was low in both populations but consistently 3 to 6 times higher in the wild-caught An. gambiae s.s. from Bubaque Island than the wild-caught An. darlingi (Table 3), indicating relatively little variation within this small sample of wild-caught An. darlingi from Rondônia State.Table 3. Nucleotide diversity (π) calculated in 100 bp windows across key insecticide resistance genes for wild-caught Rondônia State mosquitoes (n = 20) and wild-caught Bijagós Archipelago mosquitoes (n = 42)GeneWild-caughtAn. darlingi(n = 20)(Rondônia State)Wild-caughtAn. gambiae sensu stricto (n = 42)(Bijagós Archipelago*)ace10.0040.012gste20.0030.011rdl0.0040.017vgsc0.0010.006*Bubaque Island.
Natural selection across the An. darlingi genome
Integrated haplotype (iHS) and Garud’s H_12_ statistics were used to identify loci under directional selection in the wild samples (n = 20). The iHS metric measures extended haplotype homozygosity to identify signatures of recent selection within a particular population^58^. A total of 103 loci showed evidence of directional selection, which may be due to selection pressure from insecticide use, or from other selection pressures (iHS score ≥ 4) (Fig. 2A). Of the 103 loci with significant iHS scores, 57 were in protein-coding genes (Table 4). None of these protein-coding genes have previously been associated with insecticide resistance in An. darlingi or other Anopheles species.Fig. 2. Directional Selection Analysis.A iHS scores for wild-caught mosquito samples (n = 20) where iHS score ≥ 4 indicates positive selection. This significance threshold is depicted by the black dashed line. B XP-EHH comparison between colony mosquitoes (n = 8) and wild (n = 20) mosquitoes. Values ≤ -4 indicate significant selection in the wild-caught samples when compared to the colony mosquitoes, whereas values ≥ 4 indicate significant selection in the colony mosquitoes when compared to the wild samples.Table 4. Protein coding genes under significant selection in the wild-caught (n = 20) mosquito population (iHS score ≥ 4)Chr.Gene nameDescriptioniHS scoreXLOC125948124Wilms tumour protein 1-interacting protein-like5.022LOC125950173Mucin-5AC-like5.112LOC125949705RNA polymerase-associated protein CTR9 homologue4.652LOC125948229Elongation factor-like GTPase 14.642LOC125952105Uncharacterized protein-coding gene4.582LOC125950338Furin-like protease 24.542LOC125950341Uncharacterised protein-coding gene4.412LOC125951977Axin4.372LOC125959173GS homeobox 1-like4.332LOC125951112Conserved oligomeric Golgi complex subunit 34.282LOC125950864Hemicentin-14.272LOC125950282Autophagy-related protein 16-14.242LOC125951565Ras-related and oestrogen-regulated growth inhibitor-like protein4.242LOC125952875Klarsicht protein4.232LOC125948116Uncharacterised protein CG438674.222LOC125950707RNA-binding protein 34-like4.192LOC125951134Uncharacterized protein CG45076-like4.182LOC125952079Protein spaetzle 34.162LOC125952297Dual specificity calcium/calmodulin-dependent 3'2C5’-cyclic nucleotide phosphodiesterase 1A-like4.152LOC125952077Protein star4.152LOC125952109Uncharacterised protein-coding gene4.142LOC125951278EH domain-containing protein 14.132LOC125947933Nuclear pore complex protein DDB_G0274915-like4.112LOC125949291Segment polarity protein dishevelled4.112LOC125952098Estradiol 17-beta-dehydrogenase 114.092LOC125950428Tyrosine-protein kinase Dnt-like4.072LOC125952064CD109 antigen-like4.043LOC125954982Glucose transporter type 15.063LOC125956844Netrin receptor DCC4.973LOC125957598Uncharacterised protein-coding gene4.943LOC125953375Pericentrin4.793LOC125956438Neuronal growth regulator 1-like4.743LOC125953484Zwei Ig domain protein zig-8-like4.713LOC125954775Protein bric-a-brac 1-like4.73LOC125956437Succinate dehydrogenase [ubiquinone] flavoprotein subunit 2 C mitochondrial-like4.573LOC125957719Zwei Ig domain protein zig-84.383LOC125958263Transmembrane channel-like protein 74.333LOC125954270Protein TIS114.283LOC125953928Sodium-dependent transporter bedraggled4.243LOC125953389Receptor-type guanylate cyclase Gyc76C-like4.193LOC125957210Semaphorin-2A-like4.163LOC125954152Hexokinase type 24.13LOC125954661Ubiquitin-conjugating enzyme E2 W4.13LOC125957447Calcium-binding protein E63-14.093LOC125954157Mucin-5AC-like4.05
Garud’s H_12_ statistic examines the frequency of the most common haplotype in a population (H_1_) versus the second most common haplotype (H_2_), to identify soft and hard selective sweeps^59^. H_12_ was also calculated as an additional method to identify selective sweeps across the genome^60^. No significant sweeps were identified (Supplementary Fig. 3).
Cross-population extended haplotype homozygosity (XP-EHH) compares selection between two populations by contrasting haplotype lengths associated with the same allele in different populations^61^. XP-EHH analysis was performed to identify loci under significant positive directional selection in the wild mosquitoes when compared to colony mosquitoes (Fig. 2B). A value ≤ -4 indicates significant selection in the wild-caught samples when compared to the colony mosquitoes, whereas a value ≥ 4 indicates significant selection in the colony mosquitoes. Protein-coding genes identified to be under significant selection are listed in Table 5. Notably, loci under significant selection in wild mosquitoes, compared to colony mosquitoes, were located outside of protein-coding genes.Table 5. Protein-coding genes under significant selection in the colony mosquitoes (n = 8) when compared to the wild-caught (n = 20) Rondônia State mosquitoesGeneDescriptionXP-EHH valueLOC125948167forkhead box protein O4.06LOC125950636myosin-104.12LOC125948887flotillin-24.75LOC125948913microtubule-associated protein Jupiter4.38LOC125948827myb-like protein AA4.81LOC125948822uncharacterized LOC1259488224.06LOC125948826nephrin4.22LOC1259489203’-5’ ssDNA/RNA exonuclease TatD4.15LOC125948424protein crumbs4.22LOC125948318hybrid signal transduction histidine kinase M4.22LOC125952005uncharacterized LOC1259520054.82LOC125948312apolipoprotein D-like4.04LOC125948301inactive rhomboid protein 1-like4.08LOC125948302RNA exonuclease 1 homologue4.34LOC125948306hemicentin-14.58LOC125959064trithorax group protein osa4.35LOC125959072fibroblast growth factor receptor homologue 1-like4.42LOC125950577octopamine receptor beta-3R-like4.73LOC125950168uncharacterized LOC1259501684.40LOC125950219uncharacterized LOC1259502194.16LOC125950181protein groucho4.05LOC125950219uncharacterized LOC1259522554.59LOC125948441heart- and neural crest derivatives-expressed protein 2-like4.05LOC125948435uncharacterized4.32All genes identified under significant selection were in chromosome 2.
To further investigate signatures of selection, Tajima’s D (T_D_) statistic was calculated in 20,000 bp windows across the genome for each population. In the colony samples, median T_D_ was 1.09, and 269 windows exhibited T_D_ values > 2.5 (Supplementary Fig. 4), suggesting balancing selection or a recent population bottleneck; consistent with the characteristics of a colony-bred population. Whereas the wild-caught samples had a median T_D_ of -0.877, suggesting possible population expansion. This is lower than the T_D_ found in a larger study of 1094 An. darlingi from across South America (median T_D_ = 0.26 per population), suggesting stability in An. darlingi population size^55^. Whereas, our lower T_D_ value supports a possible population expansion in the wild-caught mosquitoes from Rondônia, which may be influenced by our small sample of 20 wild-caught mosquitoes. Both of these studies identified T_D_ values contrasting with the more negative T_D_ associated with An. gambiae populations, which are reported as T_D_ < -1.5 and reflect rapid population expansion within this African species complex^51^.
Structural variants
Structural variants (SVs) were identified in regions containing putative genes of interest, including 148 inversions, 54 deletions, 27 duplications, 18 insertions, and 32 inter-chromosomal translocations (Supplementary Data 2). Among these SVs, we identified gene duplications which frequently involved members of the cytochrome P450 gene family (Table 6); a gene family previously associated with insecticide resistance^41,48^. These included cytochrome P450 4cs3-like, 4d1-like, 4C1-like and 3A19-like genes, and cytochrome P450 6a13, 313a4, and 313a1 genes. Additionally, we identified duplications that introduced premature stop codons, splice region variants, and duplicated exons. One limitation of this analysis is the relatively low average read depth (8.99-fold), which may have resulted in missed SVs that could be detected in future studies using higher coverage sequencing.Table 6. Duplications with high impact on genes that may be involved in insecticide resistanceChrDuplication IDGenes with HIGH impact as defined using SnpEff^90^ and the effect on these genes.Frequency in wild-caught Rondônia State mosquitoes (%)Frequency in colony mosquitoes (%)XDUP00004667Frameshift and splice region variant in the cytochrome P450 4c3-like gene (LOC125953818)14.702DUP00010794Gene fusion involving two cytochrome p450 4d2-like genes (LOC125950881 and LOC125950889. Duplication between 14392190 and 14395703 (~3.5 kb).26.312.52DUP00010862Introduction of a premature stop codon and impacting a splice region in cytochrome P450 4d2-like gene (LOC1259483340)36.112.52DUP00019559Gene fusion between the cytochrome P450 4C1-like gene (LOC125950286) and the transmembrane and coiled-coil domain 2 protein (LOC125950273). Duplication between positions 36277503 and 36280083 (~2.5 kb).031.32DUP00019560Gene fusion between transmembrane and coiled-coil domain 2 protein (LOC125950273) and the cytochrome P450 4C1-like gene (LOC125950286). Gene fusion between LOC125950273 and the cytochrome P450 4C1-like gene (LOC125950311). Duplication between positions 36278078 and 36281104dup, ~3 kb.52.550.02DUP00019562Gene fusion between transmembrane and coiled-coil domain 2 protein (LOC125950273) and the cytochrome P450 4C1-like gene (LOC125950286). Gene fusion between LOC125950273 and the cytochrome P450 4C1-like gene (LOC125950311). Duplication between positions 36278947 and 36282009dup, ~3.1 kb).31.618.82DUP00044715Gene fusion between multidrug resistance-associated protein 1-like gene (LOC125951334) and ras-related protein Rab-9B (LOC125951567). Duplication of part of the glutathione S-transferase-1 like gene (LOC125951365).2.502DUP00048534Gene fusion between cytochrome P450 9f2s (LOC125952882 and LOC125952880). Duplication between positions 91682639 and 91685220, ~2.5 kb.2.7843.753DUP00052410Gene fusion between two probable cytochrome P450 6a13 genes, LOC125956304 and LOC125958131. Duplication between positions 5372527 and 5375351 (~2.8 kb).17.656.253DUP00072025Gene fusion between probable cytochrome P450 313a4 (LOC125955537) and probable cytochrome P450 313a1 (LOC125955540) Duplication between positions 46108971 and 46118421 (~9.4 kb).41.743.83DUP00072031Duplication of cytochrome P450 3A19-like (LOC125955539), and a frameshift variant of this gene. Gene fusion between cytochrome P450 3A19-like (LOC125955539), probable cytochrome P450 313a4 (LOC125955538), probable cytochrome P450 313a4 (LOC125955537), and probable cytochrome P450 313a1 (LOC125955540)13.925.03DUP00075794Gene fusion between glutamate receptor 1-like (LOC125954300) and an uncharacterized protein (LOC125954378). Duplication between positions 52862921 and 53509817, ~647 kb.2.50An. darlingi colony (n = 8) and wild (n = 20) mosquitoes.
Discussion
This study presents WGS data and analysis for two An. darlingi populations from Rondônia, Brazil: a wild-caught population from Candeias do Jamari and a colony population originated from mosquitoes collected in Porto Velho. Clear genetic separation between these two populations was identified in the principal component and admixture analyses, with distinct ancestries and differentiation across chromosomes 2, 3, and X. Maximum likelihood phylogenetic analysis and principal component analyses of mitochondrial DNA did not show clear separation between the colony and wild-caught populations, which is not surprising given the close proximity of the Candeias do Jamari and Porto Velho collection sites (<100 km apart). Nucleotide diversity was low in both populations, reflecting the small sample size, and Tajima’s D analysis suggested a bottleneck in the colony mosquitoes and possible population expansion in the wild-caught mosquitoes. These findings align with expectations for the colony, given extensive inbreeding, and for the wild-caught population, which was sampled from a small geographic area (radius <5 km).
Fixation index (F_ST_) analysis further supported genetic differentiation between the two populations, with chromosome 2 showing the highest divergence, followed by chromosome 3 and chromosome X. Whereas, F_ST_ analysis of the mitochondrial genome showed reduced divergence. Notably, the gamma-aminobutyric acid type B receptor subunit 2 (ADAR2_002857) on chromosome 2 exhibited high differentiation between the colony and wild-caught samples. Organochlorine insecticides act by binding to the GABA receptor complex, with the most common resistance mutation being A259S/G in the rdl gene^29,62^. Overall, F_ST_ was high in An. darlingi when compared with other Anopheles species, including An. gambiae and An. coluzzii in Africa^51^. This high F_ST_ is corroborated by a much larger study of 1094 An. darlingi from across South America, which also found elevated F_ST_ scores in An. darlingi population comparisons^55^.
Previous studies using mitochondrial cox-1 sequences have suggested that geographic barriers, such as the Andes and Amazon River, influence An. darlingi population structure^63–66^. The colony samples in this study originated from mosquitoes collected in Porto Velho in 2018, while the wild-caught samples were obtained less than 100 km away in 2018 and 2019^67^. Our study demonstrates the advantage of genome-wide approaches in resolving population dynamics as, despite their geographic proximity, WGS analysis was able to effectively distinguish the ancestries of these populations.
No well-characterised mutations associated with insecticide resistance in other Anopheles species were detected in An. darlingi. However, we identified seven novel nonsynonymous SNPs within four key resistance-associated genes (ace-1, rdl, GSTe2, vgsc), including the Pro523Leu mutation in vgsc. Further research is needed to determine their functional significance, including the integration of phenotypic resistance data. Similar investigations in An. funestus have identified molecular markers of resistance, suggesting a comparable approach could be applied to An. darlingi^38,68,69^.
Metabolic detoxification is hypothesised to play a greater role in An. darlingi insecticide resistance than target-site mutations^70^. In An. gambiae, pyrethroid resistance has been linked to copy number variations (CNVs) in detoxification enzyme families (cytochrome P450s, esterases, and glutathione-S-transferases (GSTs)). Using WGS, we were able to examine the genetic diversity of these gene families in An. darlingi. We found no selective sweeps over insecticide resistance-related genes through analysis of extended haplotype homozygosity using H_12_^60^. However, structural variant analysis revealed high-impact gene duplications involving multiple cytochrome P450 genes:cyp4c3, cyp4d2, cyp4c1, cyp9f2s, cyp6a13, cyp313a4, cyp3a19, cyp313a1-like. Three of these genes (cyp4c1, cyp4c3, cyp4d2) have been implicated in insecticide resistance in other species^71,72^; cyp4c1 is upregulated in Aedes aegypti resistant to DDT, pyrethroids, and carbamates, and in Cydia pomonella larvae resistant to deltamethrin^73,74^. Additionally, cyp4c1 has been linked to metabolic compensation during starvation in cockroaches and is hormonally regulated^75,76^. It has also been found to be upregulated in the crop pest Leptinotarsa decemlineata following exposure to neonicotinoids^77^. In contrast, the function of cyp4c3 remains less well understood, though it has been found upregulated in honeybees after amitraz insecticide treatment^78^.
In addition, we identified a duplication in the glutathione S-transferase gene (ADAR2_012021), a gene family known to confer resistance to organophosphates and DDT^79^. Class 1 GSTs, which mediate resistance in Musca domestica, are believed to perform a similar function in Anopheles species, though further evidence is needed to confirm this^80–82^. More generally, to determine the functional role of these duplications in insecticide resistance, further studies incorporating phenotypic resistance data are required. Comparative RNA-seq analysis of insecticide-resistant and susceptible An. darlingi populations could help to identify resistance-associated gene expression changes. Limitations of this study include the lack of phenotypic resistance data and the small overall sample size, which is limited to a small geographic region. Future studies should incorporate larger sample sizes of An. darlingi from diverse regions, along with phenotypic resistance data for the most frequently used insecticides in Brazil. Other phenotypes can also be investigated, such as biting behaviour, which has been explored for this species using low-coverage genomic sequencing data of pooled samples^83^.
Overall, our study presents an initial analysis of WGS data from An. darlingi malaria vectors, establishing a foundation for future studies with more expansive sample sizes and associated phenotypic data. Large-scale surveillance of An. darlingi is essential to gain deeper insights into its molecular ecology and to identify genomic markers linked to key traits, including insecticide resistance, vector competence, blood-feeding behaviour, and adaptation to environmental changes. This comprehensive approach will provide valuable insights into the genetic factors influencing An. darlingi populations and enhance the development of targeted control strategies.
Methods
Mosquito collection, species identification, and DNA extraction
Two populations of An. darlingi from the State of Rondônia were used in this study, one wild-caught population from Candeias do Jamari, right margin of Madeira River, (n = 20), and a cohort of colony isolates from Porto Velho (n = 8)^67^. The wild-caught mosquitoes were collected during vector density studies in malaria-endemic regions in Rondônia in 2018–19. Entomological surveys were performed as 8 consecutive collections at the beginning of the rainy season (Oct and Nov 2018), and another eight collections were performed at the beginning of the dry season (May and June 2019)^84^. The colony of mosquitoes has been maintained since 2018, originally collected in Porto Velho, left margin of Madeira River^45^. Specimens were morphologically identified via microscopy. Genomic DNA from the isolates was extracted from the whole mosquito using Qiagen DNaeasy Blood and Tissue kits (Qiagen, Hilden, Germany), according to the manufacturer’s instructions. DNA was quantified using the Qubit 2.0 fluorimeter HS DNA kit (Thermofisher), and stored at −20 °C.
Whole genome sequencing and bioinformatics analysis
Twenty-eight An. darlingi isolates were sequenced using the Illumina MiSeq on 2 x 250 bp paired-end configuration in 2024, with an average read-depth of 8.99-fold. The raw paired fastq files were trimmed using trimmomatic software (version 0.39), and then aligned using bwa-mem software to the AnoDar_H01 (An. darlingi) reference genome using default parameters^85,86^. Genome coverage from the mapped bam files was calculated using samtools, and variants called and validated using HaplotypeCaller and VailidateVariants from GATK software, respectively (v4.1.4.1)^87,88^. Once VCFs had been generated for each sample, GATK was used to create a multi-sample VCF with GenomicsDBImport and GenotypeGVCF functions. The multi-sample VCF was filtered to include chromosomal variants using bcftools (v 1.17) and GATK’s VariantFiltration, with parameters: QD > 5.0, QUAL > 30.0, SOR < 3.0, FS < 60.0, MQ > 40.0, MQRankSum > − 12.5, ReadPosRankSum > -8.0. Reads were subsequently filtered to retain those with DP > 5.0 and GQ > 20.0, and variants were filtered to retain those with < 20.0% missing genotypes or MAF > 0.01. The resulting VCF was phased with Beagle (version 5.2)^89^ and annotated using SnpEff (v5.1 d)^90^.
Population genetic analysis
From the filtered multi-sample VCF, a pairwise-genetic distance matrix was generated using PLINK (v1.90b6.21 64-bit)^91,92^. This matrix was used for the generation of a maximum-likelihood tree using RAxML-NG (v 1.2.0), and principal component analysis was calculated using the R package ape (5.7-1)^93^. The ML tree generated was annotated and visualised in iTOL^94^. Ancestry admixture analysis was conducted using ADMIXTURE software (version 1.3)^95^. The optimum K value (estimated number of ancestral populations) was calculated through cross-validation of 1-10 eigenvalue decay dimensions^92^. In this instance, K = 2 was estimated, and ADMIXTURE software was used to analyse the shared ancestral populations (denoted K1 and K2) in these samples. The output was then visualised in R using ggplot2 package (v 3.4.2)^96^.
Genomic regions under selection were identified with the python package scikit-allel (v 1.3.6)^97^. Three complementary statistics were used: H_12_^60^, iHS, and XP-EHH^61,98^. The Integrated Haplotype Statistic (iHS) was used to find directional selection within populations, and extended haplotype homozygosity (XP-EHH) between the two populations^99,100^. Garud’s H12 was calculated using phased biallelic SNPs in 1000 bp windows, using the movinggarudh function^97^. Two hundred iterations of H12 were calculated, followed by plotting of the mean value for each genomic window. iHS was computed using phased biallelic SNPs using the allel.ihs function^97^. Raw iHS scores were then standardized using the function allel.standardize_by_allele_count, and p-values were plotted. XP-EHH was calculated using phased biallelic SNPs using the allel.xpehh function. The Tajima’s D (T_D_) statistic was calculated using vcftools (v0.1.16) in 20 kbp windows across chromosomes to identify balancing selection^101,102^. Nucleotide diversity (π) was calculated in 100 bp windows across chromosomes using vcftools and plotted in R. Weir and Cockerham’s F statistic (F_ST_) was calculated in 1-kbp windows over each chromosome using the windowed_weir_cockerham_fst function in scikit-allel^97^.
Examination of insecticide resistance associated genes
A bed file of genes commonly associated with target-site insecticide resistance SNPs was created. All annotated cytochrome P450s and esterases linked to metabolically mediated insecticide resistance were included. The bed file was then applied to the multi-sample VCF using bcftools (v1.17) view function. The snpEff software was then used to annotate the effect each SNP would have on the protein’s amino acid sequence^90^. A custom-built database was created for this, following the snpEff instructions at https://pcingola.github.io/SnpEff/snpeff/build_db_gff_gtf/ and using the instructions AnoDar_H01 GFF file which is available for download from NCBI https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_943734745.1/). DELLY (v1.1.8) was used to investigate CNVs across the whole genome^103^. These were then filtered for quality and for those impacting genes within the expanded list of candidate genes used above.
Ethics & inclusion
This project was determined and conducted in collaboration with local researchers and is locally relevant.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Supplementary Information Description of Additional Supplementary File Supplementary Data 1 Supplementary Data 2 Reporting Summary
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Global Malaria Programme: WHO Global. WHO World Malaria Report 2022. WHO.
- 2WHO. Global Malaria Programme: Malaria Threats Map. https://www.who.int/teams/global-malaria-programme/surveillance/malaria-threats-map.
- 3World Health Organisation. Global Report on Insecticide Resistance in Malaria Vectors: 2010-2016 Global Malaria Programme.
- 4Namias, A., Jobe, N. B., Paaijmans, K. P. & Huijben, S. The need for practical insecticide-resistance guidelines to effectively inform mosquito-borne disease control programs. e Life 10, e 65655.10.7554/e Life.65655 PMC 834628034355693 · doi ↗ · pubmed ↗
- 5Organização Mundial da Saúde. Guidelines for Malaria Vector Control. Guidelines for Malaria Vector Control (2019).
- 6Application of a targeted amplicon sequencing panel to screen for insecticide resistance mutations in Anopheles darlingi populations from Brazil. https://www.researchsquare.com, 10.21203/rs.3.rs-3053716/v 1 (2023).10.1038/s 41598-024-84432-x PMC 1169896439753672 · doi ↗ · pubmed ↗
- 7Bickersmith, S. A. et al. Mutations Linked to Insecticide Resistance Not Detected in the Ace-1 or VGSC Genes in Nyssorhynchus darlingi from Multiple Localities in Amazonian Brazil and Peru. Genes 14, 1892 (2023).10.3390/genes 14101892 PMC 1060671037895241 · doi ↗ · pubmed ↗
- 8Ingham, V. & Nagi, S. Genomic profiling of insecticide resistance in malaria vectors: insights into molecular mechanisms. Res. Sq. rs.3.rs-3910702 10.21203/rs.3.rs-3910702/v 1. (2024).
