Transcriptomic analysis of non-model Drosophilidae reveals novel AMP candidates
Pankaj Dhakad, Dhobasheni Newman, Darren J. Obbard

TL;DR
Researchers studied immune responses in three non-model fruit fly species and found new antimicrobial peptide candidates, revealing evolutionary differences in immune systems.
Contribution
The study identifies 20 novel AMP-like candidates and demonstrates immune diversity in non-model Drosophilidae species.
Findings
Three non-model drosophilid species retain core immune genes but show variation in effector gene content and inducibility.
Scaptodrosophila deflexa lacks orthologs of known AMPs and shows minimal immune response to bacterial challenge.
Twenty novel AMP-like candidates were identified with structural features similar to known AMPs.
Abstract
Drosophila melanogaster has been a valuable model for dissecting the molecular architecture of innate immunity. However, the family Drosophilidae encompasses over 4000 species, spanning deep evolutionary divergences and diverse ecologies. Here, we use immune challenge with the Gram-negative pathogen Providencia rettgeri to investigate the conservation and evolution of immune responses in three non-model drosophilid species that diverged from D. melanogaster over 45 million years ago—Hirtodrosophila cameraria, H. confusa, and Scaptodrosophila deflexa. We find that all three species retain a core set of immune signaling and recognition genes, but exhibit substantial variation in effector gene content and inducibility. In particular, Scaptodrosophila deflexa lacks orthologs of multiple antimicrobial peptides (AMPs) known from D. melanogaster, including DptA, AttA, and AttC, and shows…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —https://doi.org/10.13039/501100000268Biotechnology and Biological Sciences Research Council
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInvertebrate Immune Response Mechanisms · Neurobiology and Insect Physiology Research · Insect symbiosis and bacterial influences
Background
In all organisms, the innate immune system forms the first line of defense against pathogens and parasites [1–3]. By rapidly recognizing and responding to infections, it reduces pathogen survival and replication—decreasing pathogen fitness to the benefit of the host. These antagonistic interactions can lead to coevolutionary dynamics, where immune-related genes—particularly those involved in pathogen recognition and effector functions—evolve at significantly higher rates compared to the genome-wide background (e.g., [4–8]). The fruit fly Drosophila melanogaster has long served as a model for dissecting innate immunity in insects and helped in understanding key pathways such as Toll, Imd, JAK/STAT, and RNA interference (RNAi), which orchestrate defense responses against bacteria, fungi, and viruses (e.g., [9, 10]).
As in vertebrates, D. melanogaster mounts both cellular and humoral innate immune responses to combat pathogen infection. In Drosophila, cellular immunity primarily involves phagocytosis by plasmatocytes and encapsulation by lamellocytes, targeting invading microbes and larger parasites, respectively [11, 12]. In contrast, the humoral response mainly leads to the rapid production and systemic release of antimicrobial peptides (AMPs), which are secreted into the hemolymph to directly kill pathogens. Two major signaling pathways, Toll and Imd, regulate the majority of immune genes in Drosophila, including the production of AMPs. The Toll pathway is predominantly activated by lysine (Lys)-type peptidoglycan found in Gram-positive bacteria, as well as fungal β-glucans and host-derived microbial proteases. The Imd pathway is activated through the detection of diaminopimelic acid (DAP)-type peptidoglycan from Gram-negative and certain Gram-positive bacteria [10]. In addition to Toll and Imd, the JAK/STAT pathway plays a modulatory role in immune defense. While its primary role is in development, stress response, and stem cell maintenance, it also contributes to immunity by regulating the expression of genes such as those encoding thioester-containing proteins (TEPs) and Turandot stress proteins [13–15].
The completion of 12 Drosophila genomes in 2007 opened the door to evolutionary analyses across multiple species, enabling researchers to investigate gene copy number variation, patterns of positive selection, and lineage-specific immune responses [16]. One striking pattern that emerged was an apparent dichotomy in the evolutionary dynamics of Drosophila immune genes; while core signaling components of immune pathways are often deeply conserved, the evolution of recognition and effector genes can be highly dynamic [8, 17, 18]. Upstream signaling molecules such as Relish and Dif (Dorsal-related immunity factor) family members typically persist as 1:1 orthologs, even in distantly related insects [19], and have detectable sequence homology in mammals (e.g., NF-κB), highlighting the deep conservation of immune signaling genes [20]. In contrast, AMP gene families frequently undergo duplication, pseudogenization, and loss, and exhibit high copy number variation between species [8, 18, 21, 22]. In some species, AMPs such as drosocin, drosomycin, turandot, and metchnikowin are either completely absent or show such a high sequence divergence that they are difficult to identify through standard homology searches [22–24].
This contrast raises an important but hard to answer question: what drives the dynamic evolutionary patterns of antimicrobial peptides (AMPs)—including gene duplications, losses, and lineage-specific expression? This question is difficult to answer, in part, because our experimental understanding of the drosophilid antibacterial response comes largely from a few of the more experimentally tractable species, predominantly D. melanogaster and its close relatives within the subgenus Sophophora [25]—with only handful of studies outside of this subgenus (see [17, 22–24, 26]). Nevertheless, comparative studies have revealed striking lineage-specific variation in effector gene repertoires.
The antimicrobial peptide diptericin offers a particularly well-characterized example of this variation. Diptericin genes are key effectors of the Imd pathway and show differential inducibility by Gram-negative bacteria across Drosophila species [22]. A population genetics study found that a serine/arginine polymorphism at the 69th residue of DptA significantly alters susceptibility to Providencia rettgeri bacterial infection in D. melanogaster and D. simulans—an association interpreted as a signature of balancing selection [27]. More recent work has shown that the selective landscape acting on diptericin may be highly context-dependent: interactions between host genotype, sex, environmental stress (e.g., starvation), and pathogen exposure can shape the evolutionary trajectories of AMP alleles, potentially maintaining diversity over time [28].
At a broader phylogenetic scale, AMP families exhibit even more extreme evolutionary dynamics. For example, while DptA has been lost or pseudogenized in some non-melanogaster species, others possess divergent paralogs (e.g., DptC in the subgenus Drosophila) that are syntenic but highly diverged at the sequence level [17]. Consistent with this, AMP duplicates can undergo neofunctionalization: in D. virilis, a defensin paralog has evolved from a role in bacterial killing to one in toxin neutralization [29]. Together, these findings suggest that while the basic architecture of innate immunity is ancient and broadly conserved, the downstream effector repertoire can be evolutionarily labile and shaped by selection from species-specific microbial exposure, life-history, and ecological pressure.
Now, with the recent availability of nearly 400 drosophilid genomes [30]—over 300 of which have been annotated [31]—there is an opportunity to explore the extent to which canonical immune responses are conserved across the drosophilid phylogeny. For example, we can ask if deeply diverged lineages harbor novel immune effectors, or if more distantly diverged non-model species—with their unique ecological niches and evolutionary histories—possess a distinct immune repertoire. However, one major challenge remains; most functional studies have focused on species amenable to long-term laboratory culture, but these represent only a small fraction of drosophilid diversity [30]. Understanding gene function in species that are not easily cultured in large numbers remains challenging.
In this study, we demonstrate the feasibility of characterizing immune responses to bacterial challenge in non-model, less easily cultured, species of Drosophilidae by performing comparative transcriptomic analysis on individual first-generation wild-derived flies. We selected three common European species for analysis—Hirtodrosophila cameraria, Hirtodrosophila confusa, and *Scaptodrosophila deflexa—*each of which is highly divergent from D. melanogaster and other well-studied taxa (> 45 Mya; Fig. 1; [32]). These species are not only genetically distant, but also ecologically distinct. Hirtodrosophila confusa is a relatively large drosophilid and a fungal specialist that thrives in cool temperate environments and is frequently found in association with large fungal fruiting bodies such as dryad’s saddle (Polyporus squamosus). Hirtodrosophila cameraria is also a specialist fungus breeder, moderately abundant in the UK on basidiomycete fungi such as Phallus impudicus and Lactarius quietus [33, 34]. Scaptodrosophila deflexa, in contrast, is thought to lay its eggs in yeast-rich sap fluxes. However, despite their broad distribution and ecological interest, these species remain very poorly studied. For example, the genera Hirtodrosophila and Scaptodrosophila have both been shown to be polyphyletic (Hirtodrosophila partly within the paraphyletic Drosophila; [30, 35]). Despite the availability of high-quality genome sequences [30, 36], they are yet to receive any attention in functional or comparative genomics. To our knowledge, only a single report addresses egg viability in H. confusa [37], and no studies to date have explored their immunogenomics or transcriptomic responses to infection. By analyzing these lineages, we aim to expand our understanding of immune system diversity and evolution within Drosophilidae, moving beyond the traditional model organisms and exploring a broader phylogenetic landscape.Fig. 1. The phylogenetic position of the study species within Drosophilidae. An approximately time-calibrated phylogeny showing the relationships of major lineages of Drosophilidae, including the placement of the three non-model species used in this study—Hirtodrosophila cameraria, H. confusa, and Scaptodrosophila deflexa. The Hirtodrosophila species belong to a diverse and understudied cluster within the subgenus Drosophila, while S. deflexa branches deep within the subfamily Drosophilinae, representing one of the earliest branching lineages relative to D. melanogaster (most recent common ancestor >50 millions of years ago). Estimated divergence times are shown in MYA along the x-axis. Taxonomic groups are color-coded by genus/subgenus or species group, and collapsed clades indicate major drosophilid lineages for clarity. Images to the right show adult male (♂) and female (♀) flies of each of the three focal species, plus D. melanogaster at the same scale for comparison. The tree figure was generated as described in Dhakad et al. [31], based on 285 single-copy BUSCO orthologs
Here we compare the transcriptomes of unchallenged flies with those of flies we experimentally challenged with the Gram-negative bacterial pathogen Providencia rettgeri. Our goals are to (i) evaluate whether pathogen-challenged transcriptomes improve the annotation of immune genes in species lacking reference-quality genomes; (ii) assess the feasibility of differential expression analyses in these non-model species; and (iii) identify conserved and novel immune-related genes, including potential AMPs. Our findings suggest that despite their deep evolutionary divergence, bacterial infection induces measurable and broadly similar transcriptional responses in H. cameraria and H. confusa species but not in the more distant S. deflexa—a pattern that could potentially be shaped by differences in microbiota, immune strategy, or symbiont-mediated protection.
Results
Pathogen challenge does not substantially improve annotation
To assess the impact of the availability of pathogen-challenged RNA-seq on gene predictions, we generated three separate genome annotations for each species using BRAKER3 [38]. The three annotations were based on (i) RNA-seq from pathogen-challenged flies, (ii) RNA-seq from unchallenged flies, and (iii) combined data. All three annotations recovered the same set of core genes in each species, and most gene models were shared between the challenged and unchallenged annotations, with only a small number unique to one set or the other (Table 1; Fig. 2). The combined RNA-seq data produced slightly more gene models compared to either pathogen-challenged or unchallenged datasets alone for H. cameraria and S. deflexa, whereas the pathogen-challenged dataset yielded slightly more genes for H. confusa (Table 1). To assess the global similarity between genome annotations generated from pathogen-challenged and unchallenged RNA-seq datasets, we counted the shared (i.e., overlapping coordinates) and unique gene models, recording how many were assignable to an orthogroup (Fig. 2). As expected, most genes were shared between pathogen-challenged and unchallenged annotations (category C in Fig. 2), suggesting consistent annotation of core gene sets across datasets. Only a small number of genes were uniquely recovered in either pathogen-challenged or unchallenged annotations (categories A, B, D, and E in Fig. 2), and many of these lacked orthogroup assignment, implying that they may be annotation errors, or potentially novel genes. To examine whether pathogen-challenged RNA-seq improved the recovery of homologs of 638 well-characterized D. melanogaster immunity-related genes (see Methods), we compared immune gene orthogroup (“Hierarchical Orthologous Group”; HOG; [39]) representation between pathogen-challenged and unchallenged annotations for each species. Small differences were observed, but most immune genes were either recovered in both annotations or missing from both, with few genes uniquely recovered by pathogen-challenged RNA-seq (Table 1). To complement this analysis, we performed Gene Ontology (GO; [40]) enrichment analysis on genes uniquely annotated from pathogen-challenged or unchallenged RNA-seq datasets. However, no significant enrichment for immune-related terms was detected among genes uniquely recovered in pathogen-challenged annotations (Additional file 1). Overall, these results indicate that pathogen-challenged RNA-seq did not substantially improve overall immune gene discovery or annotation completeness compared to unchallenged RNA-seq. Table 1. Gene annotation statistics and immune gene recovery from pathogen-challenged, unchallenged, and combined RNA-seq data for each speciesSpeciesGenesMean CDS length (Kbp)Immune OGsImmune genesImmune CDS length (Kbp)D. melanogaster13,9861.545686381.74H. cameraria (unchallenged)14,4071.465155451.88H. cameraria (challenged)14,2031.475205511.86H. cameraria (combined)14,5031.475395731.87H. confusa (unchallenged)14,2001.394905191.72H. confusa (challenged)14,4691.395005301.78H. confusa (combined)14,3001.415345661.76S. deflexa (unchallenged)13,0171.504905191.95S. deflexa (challenged)12,9911.484855141.93S. deflexa (combined)13,1481.505225531.95Fig. 2Comparison of gene sets annotated using RNA-seq from pathogen-challenged and unchallenged samples. Venn diagrams illustrating the overlap of gene models annotated using RNA-seq from pathogen-challenged (green) and unchallenged (orange) samples of Hirtodrosophila cameraria, H. confusa, and Scaptodrosophila deflexa. Gene sets were compared based on (i) genomic coordinate (1:1 overlap) and (ii) orthogroup assignment using OrthoFinder. Numbers inside Venn diagrams (A to F) represent gene model counts. The total number of predicted genes per condition is shown in parentheses above each label
The detectable immune repertoire differs between species
To further assess how immune gene recovery varied across the three divergent drosophilid lineages, we examined the presence/absence of genes in H. cameraria, H. confusa, and S. deflexa that have homology with a curated set of 638 D. melanogaster immune-related genes (see Methods). These 638 immune genes were clustered into 568 HOGs. Of these, H. cameraria recovered genes in 539 HOGs, H. confusa in 534, and S. deflexa in 522 (Table 1; Additional file 2). In total, 497 HOGs had immune genes recovered across all three species, while 11 HOGs contained only D. melanogaster genes. These uniquely missing HOGs included several AMPs such as drosocin (Dro), all members of the turandot family, and drosomycins (Drs and Drsl1–6; Additional file 2). These apparent losses support previous observations that the drosomycin and turandot AMP families are largely restricted to D. melanogaster and closely related species in the subgenus Sophophora [17, 24]. However, we found homologs of a drosocin-like gene in all three species, and in H. confusa and H. cameraria this gene encodes multiple tandem repeats of the peptide domain—as previously reported in D. neotestacea and other species in the subgenus Drosophila [17]. In total, only two recognition proteins (PGRP-SB2 and CG12780) and six signaling genes (serpin 42Dd, MstProx/Toll-3, Toll-4, sphinx1, sphinx2, and amnesiac) were missing from all three species. Interestingly, S. deflexa lacked orthologs of diptericin A (DptA), attacin C (AttC), and attacin D (AttD), suggesting lineage-specific loss of these AMPs. When compared with other previously annotated Scaptodrosophila species [31], S. deflexa and S. lebanonensis shared 517 of the 638 D. melanogaster immune genes, while 8 were uniquely present in S. deflexa and 61 were unique to S. lebanonensis. Scaptodrosophila as currently defined is paraphyletic, and the subgenus itself traces back to a most recent common ancestor estimated to have diverged over 50 million years ago—earlier than the split between the Drosophila and Sophophora subgenera. Thus, the observed differences in immune gene repertoires between S. deflexa and S. lebanonensis likely reflect long-term divergence accumulated within this ancient lineage, rather than recent species-specific gene turnover.
To test whether the species and immune-gene functional categories differed in the probability of each “gene” (i.e., orthogroup) being recovered, we fitted a Bayesian binomial linear mixed-effects model using MCMCglmm. Estimates from posterior revealed a significant species effect, with H. cameraria (posterior mean = 99.88%; 95% credible interval: 99.61–99.98) and H. confusa (99.79%; CI: 99.35–99.97) showing similar probabilities of immune gene recovery, while S. deflexa recovered significantly fewer immune genes (99.58%; CI: 98.80–99.93; p = 0.001). This may partly reflect differences in genome and annotation quality, but also likely reflects genuine differences in immune gene conservation and loss. We also detected a marginal effect of gene category, such that canonical “signaling” genes (99.97%; CI: 99.91–99.99; p = 0.06) were more likely to be recovered than effector genes (99.46%; CI: 98.23–99.94), although “receptor” and “unknown” categories were not significantly different to effectors. Among interaction terms, only the combination of S. deflexa and signaling genes showed a significant positive deviation from expectation (99.97%, CI: 99.89–99.99; p = 0.03), indicating that signaling genes were relatively well retained even in S. deflexa. Other species and category interactions terms were not significantly different from additive expectations.
To illustrate the evolutionary lability of effector gene families, we chose to examine the diptericin and attacin genes in more detail. Phylogenetic analysis of diptericins (Fig. 3A) confirmed three distinct clades: DptA, DptB, and DptC, the latter being restricted to the subgenus Drosophila [17]. Scaptodrosophila deflexa encoded only a single DptB-like gene, clustering with orthologs from H. confusa, H. cameraria, and D. melanogaster, but lacked any detectable DptA and DptC homologs. In contrast, multiple DptA and DptC paralogs were recovered in both Hirtodrosophila species, suggesting lineage-specific expansions. Similarly, the attacin gene tree showed that H. cameraria and H. confusa have multiple copies of AttC and AttA/B, whereas S. deflexa possesses only a single AttA/B-like gene with no detectable orthologs of AttC and AttD (Fig. 3C; Additional file 3). The clear pair of AttA and AttB genes in most species might superficially suggest that each species has experienced a recent duplication independently (Additional file 3). However, it is more likely that these patterns reflect concerted evolution in diptericin and attacin gene families [18, 41], highlighting the challenges of inferring orthologs of AMPs across divergent genomes. Our results parallel recent findings by Hanson et al. [26], who showed that the presence and expression of DptA and DptC correlate with fly ecology and susceptibility to Providencia rettgeri. The loss of DptA and DptC in S. deflexa—a species with minimal response to Providencia exposure—may therefore reflect its distinct ecological niche, whereas the retention of DptB across lineages, including the mushroom-feeding Hirtodrosophila, contrasts with the losses observed in other clades [26]. Taken together, these results show the rapid and idiosyncratic evolution of effector gene repertoires across drosophilid species, and illustrate how their annotation can be particularly sensitive to both sequence divergence and genome assembly quality.Fig. 3. Divergence and lineage-specific organization of diptericin and attacin gene families in drosophilids. A Maximum-likelihood gene tree of diptericin genes from 51 drosophilid species, including Hirtodrosophila cameraria, H. confusa, and Scaptodrosophila deflexa, generated using IQ-TREE2 based on aligned amino acid sequences. The gene tree highlights three distinct clades corresponding to DptA (blue), DptB (green), and DptC (red). DptC is restricted to the subgenus Drosophila, including the Hirtodrosophila species, and is absent from both D. melanogaster and S. deflexa. B Synteny of diptericin gene clusters in D. melanogaster, H. cameraria, H. confusa, and S. deflexa. In D. melanogaster, DptA is upstream of DptB. In contrast, the Hirtodrosophila species contain multiple tandem repeats of DptC genes upstream of DptB inplace of DptA. Scatopdrosophila deflexa has only a single DptB-like gene and lacks both DptA and DptC, possible indicating lineage-specific gene loss or pseudogenization. C Synteny of attacin gene clusters in the same four species. In D. melanogaster, attacin genes are on two different chromosomes, AttD on 3R and AttC, AttA, and AttB on 2R. Hirtodrosophila cameraria exhibits all four attacin genes on same contig, with two AttC duplicates positioned downstream of AttA and AttB. Hirtodrosophila confusa also retains AttD and AttC (3 copies), although on different contigs. In contrast, S. deflexa has only a single AttA/B–like gene, with no identifiable orthologs of AttC or AttD. Note: Gene cluster diagrams in B and C are schematic and not drawn to scale; gene sizes and intergenic distances are illustrative only.
Immune challenge triggers a conserved immune response in Hirtodrosophila but not in S. deflexa
To identify genes that are transcribed in response to bacterial infection, we performed differential expression analysis between pathogen-challenged and unchallenged individuals using DESeq2 [42]. We identified 363 significantly upregulated genes in H. cameraria (adj. p < 0.05, log_2_ fold change ≥ 1), 149 genes in H. confusa, and only 33 genes in S. deflexa after bacterial challenge (Additional file 4). In addition, we identified 230 significantly downregulated genes (adj. p < 0.05, log_2_ fold change ≤ −1) in H. cameraria, 82 genes in H. confusa, and only 7 genes in S*. deflexa* (Additional file 4).
To visually assess the consistency of the response across individuals, we performed principal component analysis (PCA) of normalized expression data. In both H. cameraria and H. confusa, PCA analysis revealed clear separation between pathogen-challenged and unchallenged samples along PC1 (explaining 63% and 56% variance, respectively), confirming a strong and coordinated response to infection (Fig. 4A). In contrast, samples from S. deflexa showed no clear separation by treatment, indicating limited transcriptional response (Fig. 4A). The well-fitted dispersion estimates further suggest that this pattern reflects a weak response rather than elevated inter-individual variability (Additional file 5). Heatmaps of significantly differentially expressed genes corroborated these patterns, further highlighting the strong transcriptional induction in the two Hirtodrosophila species and the lack of a detectable response in S. deflexa (Fig. 4B).Fig. 4. Pathogen challenge with Providencia rettgeri induces immune responses in Hirtodrosophila species but not in S. deflexa. A Principal component analysis (PCA) of normalized gene expression data shows clear separation between pathogen-challenged and unchallenged individuals in H. cameraria and H. confusa, but not in S. deflexa, indicating weaker transcriptional response in the latter. B Heatmaps of significantly differentially expressed genes (adj. p < 0.05, |log_2_FC|≥ 1). Values represent rlog-transformed expression levels that were row-centered (mean-centered per gene) to emphasize relative expression differences across samples. Genes and samples were hierarchically clustered based on expression similarity. C Volcano plots displaying log_2_ fold change versus −log_10_ adjusted p values for all expressed genes, highlighting significantly upregulated genes (red) and significantly downregulated genes (blue). D Heatmap showing log_2_ fold change of selected top genes (based on log_2_ fold change and adj. p value) following pathogen challenge across all three species. Canonical Imd-pathway target AMPs such as diptericin, attacin, and cecropin are strongly induced in H. cameraria and H. confusa, but not in S. deflexa. Known immune effectors are highlighted in green and novel immune effectors predicted in this study are highlighted in blue
As expected from Gram-negative bacterial challenge, the most strongly upregulated genes in H. cameraria and H. confusa were homologs of canonical AMPs—such as diptericins, attacins, and cecropins—that are downstream targets of the Imd pathway in Drosophila melanogaster (Fig. 4C and D). Interestingly, bomanins—targets of Toll pathway—were also upregulated, although upregulation was not to the level of Gram-negative specific AMPs. We also observed significant upregulation of peptidoglycan receptors, such as homologs of PGRP-SB1, PGRP-LB, PGRP-SD, PGRP-SA, PGRP-LF, PGRP-LC, and PGRP-SC2 and as well as serine proteases (e.g., sp7), which have established roles in immune activation and microbial killing (Fig. 4D; Additional file 4). Apart from immune effectors, signaling genes such as Relish and pirk that regulate the Imd pathway were also upregulated. In contrast, S. deflexa only showed detectable induction of three PGRP receptors among known immune genes (PGRP-LB, PGRP-SD, and PGRP-SC2), consistent with a weak or absent transcriptional immune response to Providencia rettgeri (Fig. 4D; Additional file 4).
Despite the overall similarity of the transcriptional response to infection in H. confusa and H. cameraria, notable differences may nevertheless exist. For example, defensin (Def) and IM33 were only detectably induced in H. confusa, and transferrin 1 and listericin only detectably induced in H. cameraria (Fig. 4D; Additional file 4). Additionally, in all species, we identified several strongly induced genes without clear homologs in D. melanogaster, suggesting the induction of novel or species-specific immune genes, many of which might be novel AMPs (below).
We carried out Gene Ontology analyses of the differentially expressed genes to help identify functional classes of genes whose expression is induced or repressed upon pathogen challenge. In both H. cameraria and H. confusa, upregulated genes were significantly enriched for terms such as “defense response to Gram-negative/Gram-positive bacteria,” “antibacterial humoral response,” and “response to fungus” (Additional file 6). These enrichments confirm that infection induced a coordinated immune response. In contrast, only a few immune-related GO terms (“defense response to Gram-negative bacteria” and “antibacterial humoral response”) were significantly enriched among the few upregulated genes in S. deflexa, consistent with its subdued transcriptional response. Downregulated genes across all species were enriched for terms related to metabolism and structural processes.
Microbiome variation could underlie immune response heterogeneity
All flies were first-generation lab-reared individuals derived from wild-caught parents and were reared on non-standard media. This increases the likelihood that the flies carry diverse (and divergent) microbial communities. Such microbiota can influence baseline immune status, alter pathogen susceptibility, or modulate the host’s immune response to bacterial challenge [43–46]. Thus, to explore whether microbiome composition varied, and might therefore have contributed to the apparent variation in immune responses, particularly the muted transcriptional induction observed in S. deflexa, we profiled microbial taxonomic abundance using the unmapped RNA-seq reads. Because these estimates are RNA-based, they reflect relative transcriptional activity rather than absolute microbial cell abundance, as expression levels can vary among taxa and conditions [47, 48].
We first constructed a de novo assembly of unmapped reads for each library, followed by “diamond blastp” [49] searches against the NCBI “nr” protein database for taxonomic assignment. The assembly of unmapped reads produced between 15,439 and 45,168 contigs per sample, and taxonomic assignment revealed that bacterial and viral taxa dominated the microbial profiles, with occasional hits to fungal and other lineages (Fig. 5A and B; Additional file 7). Bacterial composition included common Drosophila gut-associated taxa such as Citrobacter, Fructilactobacillus, and Pseudomonas. In addition, we found reads assigned to genera such as Klebsiella, Escherichia, Streptococcus, and Salmonella, which are rarely reported in Drosophila natural microbiome (Fig. 5A and B; [50, 51]). Their presence here may reflect transient acquisition from rearing media, or lineage-specific associations unique to wild flies. Overall, these broad taxonomic patterns were generally similar between pathogen-challenged and unchallenged individuals within each species. However, more fine-scale differences emerged when we quantified within-sample microbial diversity (Fig. 5C). Notably, H. cameraria exhibited a significant increase in alpha diversity following pathogen challenge (Wilcoxon rank-sum test, p = 0.01), suggesting that infection alters the richness or evenness of microbial communities in this species (Fig. 5C). In contrast, microbial diversity remained stable across treatments in H. confusa and S. deflexa (Fig. 5C). This increase in microbial diversity in H. cameraria may be associated with susceptibility to Providencia infection. Indeed, Providencia reads were more abundant in pathogen-challenged H. cameraria individuals, suggesting successful bacterial replication. In H. confusa, Providencia was detected at lower levels and in fewer individuals, and S. deflexa showed little evidence of Providencia infection, with only two challenged individuals containing detectable levels (Fig. 5A). A few unchallenged individuals also showed low Providencia abundance, potentially due to environmental exposure or read contamination.Fig. 5. Microbiome composition and alpha diversity in pathogen-challenged and unchallenged samples across three drosophilid species. A Stacked bar plot showing the relative abundance of the top 23 bacterial genera (measured as reads per million host-mapped reads) across all samples. The 23 genera represent the union of the top 10 most abundant genera from each sample. Samples are ordered by species (H. cameraria, H. confusa, S. deflexa) and are color-coded by treatment status: pathogen-challenged (red) and unchallenged (blue). B Boxplot summarizing the relative abundance of each bacterial genus across all samples. Each box represents the distribution of abundance for a given genus, with individual data points indicating values from individual samples. Genera are ordered by median abundance. Spiroplasma shows the highest median abundance overall, and it is only found in S. deflexa. C Boxplots of alpha diversity (Shannon index) comparing pathogen-challenged and unchallenged samples within each species. A significant increase in diversity is observed in H. cameraria upon infection (Wilcoxon rank-sum test, p value = 0.01), suggesting infection-induced shifts in microbial richness and/or evenness. No significant difference in alpha diversity was observed in H. confusa or S. deflexa
Interestingly, we detected high levels of Spiroplasma in all S. deflexa individuals (Fig. 5A and B). This vertically transmitted bacterial endosymbiont is known to protect some species of Drosophila against parasitoids, nematodes, and bacterial pathogens [52–55]. Such protection, including that reported in D. melanogaster against Providencia alcalifaciens, can occur through host iron sequestration [55]. This defense mechanisms operate independently of canonical Toll and Imd pathway AMP induction, and thus may not produce a strong host transcriptional signature of infection (as seen in [56, 57]). Phylogenetic analysis of Spiroplasma open reading frames (ORFs) assembled from unmapped reads, together with publicly available Spiroplasma genomes, placed the S. deflexa strain within the Spiroplasma ixodetis group (Fig. 6), closely related to the symbiont previously described from Drosophila atripex [58, 59]. The proportion of reads mapping to Spiroplasma contigs (normalized to host-mapped reads) did not differ significantly between Providencia-challenged and unchallenged S. deflexa individuals (Wilcoxon rank-sum test, p = 0.83), consistent with a stably maintained and transcriptionally active Spiroplasma infection across treatments. The most highly expressed Spiroplasma contigs corresponded to genes involved in energy metabolism and ribosomal function in all individuals. In addition, S. deflexa showed very high expression of transferrin 1 (tsf1; mean expression = 98,575; ranked among the top 50 most highly expressed genes). However, the difference between treatments was not significant after multiple-testing correction, suggesting constitutively high levels of tsf1. Wolbachia reads were also detected at moderate levels in S. deflexa, though the most likely biological consequences are less clear cut and more context-dependent [60–62].Fig. 6. Phylogenetic placement of the Spiroplasma symbiont from Scaptodrosophila deflexa. Maximum-likelihood tree inferred using 137 publicly available Spiroplasma genomes, the Spiroplasma contigs recovered from ten S. deflexa individuals and an outgroup species (Mycoplasma genitalium). Bootstrap support values ≥ 60% are shown at nodes; the scale bar indicates amino acid substitutions per site
Divergent species encode novel candidate AMP-like proteins
Despite the broadly conserved immune repertoire and similar transcriptional responses observed in H. cameraria and H. confusa, our analyses also revealed considerable species-specific differences—particularly in the apparently muted immune response of S. deflexa. These findings highlight the possibility that deeply diverged drosophilid species can encode lineage-specific immune effecters that are highly divergent in sequence and thus undetectable through simple homology searches against D. melanogaster. We hypothesized that such genes may include novel antimicrobial peptides, which are often short, secreted, cationic proteins with low levels of sequence conservation that makes them hard to detect [63, 64]. To identify potential novel AMP-like candidates, we focused on genes differentially expressed in response to pathogen challenge (adj. p < 0.05, log_2_FC ≥ 1) that lacked detectable orthologs in D. melanogaster, and encoded short peptides of the length expected for known AMPs (i.e., ≤ 200 amino acids). Across the three species, this approach yielded a total of 41 candidates, 22 from H. cameraria, 13 from H. confusa, and 6 from S. deflexa (Additional file 8). Fourteen of these candidates were found in multispecies hierarchical orthogroups (HOGs), suggesting that a subset may represent conserved but previously unannotated drosophilid AMP families not found in D. melanogaster. The remaining candidates appeared to be species-specific or present in low-copy, potentially orphan gene families.
To evaluate their potential to encode AMPs, we screened all 41 candidates using two AMP prediction tools, AMP Scanner v2 [65] and amPEPpy [66], as well as SignalP 6.0 [67] for signal peptide prediction. Thirty-three of the 41 candidates were predicted as AMPs by at least one method, and 20 of these also had predicted N-terminal signal peptides (Additional file 8), indicating likely secretion. Based on physicochemical criteria, we further categorized the predicted AMPs into two groups: (i) strong candidates (14) that encode positively charged (net charge +1 to +10), secreted peptides often enriched in glycine, proline, or cationic residues, and (ii) likely candidates (6) that encode secreted peptides with weakly positive or negative net charge (− 2 to + 1), but are strongly induced by infection (Additional file 8). Among the strong candidates, we identified five in H. cameraria, seven in H. confusa, and two in S. deflexa. Expression levels of these genes were generally high (ranging from twofold change to 300-fold change), and most were exclusively induced in pathogen-challenged individuals, supporting a role in infection response. Structural predictions using AlphaFold [68] revealed that many of these strong candidates adopt conformations typical of known AMPs, including amphipathic α-helices and αβ-sheet-rich peptides (Fig. 7).Fig. 7. Predicted structures of novel candidate AMPs. AlphaFold3 predicted protein structures. The N- and C-termini are labeled, and black arrows denote the predicted signal peptide cleavage sites based on SignalP6.0. The structures are colored by pLDDT confidence score (blue = high accuracy, light blue = backbone expected to model well, yellow/orange = lower confidence, red = very low confidence)
Several candidates exhibited striking similarity to well-characterized AMP classes. For example, the gene Hconf/g6966 (~20% proline) resembles the proline-rich apidaecins of bees (~ 30% proline; [69]), featuring four tandem repeats of potential mature peptides flanked by Furin cleavage motifs (RXRR). The gene Hcam/g510 adopts a compact α-helical structure reminiscent of IM18 (Paillotin; [70]) from D. melanogaster, while Sdef/g5630 (~30% glycine) shows structural similarity to the glycine-rich AMP holotricin from Aedes aegypti (~ 49% glycine; [71]). Additional candidates included Hconf/g6967, a paralog of g6966, also similar to apidaecins; Hcam/g5361, with ~64% identity to D. grimshawi CecC; and Hcam/g2563, which shares ~46% identity with D. grimshawi lysozyme P. Notably, both Hcam/g4126 and Hconf/g9391 have the potential to form lysozyme-like structures (Fig. 7). An especially intriguing example is Hcam/g11853, which was not predicted as an AMP by either amPEPpy [66] or AMP Scanner v2 [65], yet was among the most strongly induced genes following infection (400-fold change). Its homolog Hconf/g10777 was similarly upregulated (70-fold change) and clustered in the same HOG (Additional file 8). While blastp searches with this gene returned only uncharacterized Drosophila proteins, weak hits to the “elevated during infection” (edin; [72]) gene family suggest that these genes may represent highly divergent edin-like effectors that escape homology detection—underscoring the evolutionary lability of this gene family in Drosophilidae. Predicted 3-D structures for all strong and likely AMP candidates are presented in Fig. 7. Taken together, their predicted secretion, AMP-like physicochemical properties, infection-induced expression, and structural similarity to known AMPs suggest that these genes likely encode novel immune effectors.
Discussion
In this study, we provide a comparative analysis of transcriptomic responses following bacterial challenge with Gram-negative pathogen Providencia rettgeri in three wild-derived, non-model drosophilid species—H. cameraria, H. confusa, and S. deflexa. These species, which diverged from D. melanogaster over 45 million years ago, represent largely uncharacterized and ecologically diverse branches of the drosophilid phylogeny. Our goals were to demonstrate the feasibility of differential expression analyses in near-wild flies, to assess whether canonical immune responses are conserved across these lineages, and to identify novel candidate immune effectors in taxa previously inaccessible to functional genomic studies. Through immune gene annotation, differential gene expression, microbiome profiling, and AMP prediction, our results reveal both conserved and lineage-specific features of insect immunity and highlight the value of including non-model species in comparative immune transcriptomic research.
Despite the intuitive appeal of using pathogen-stimulated transcriptomes to improve immune gene prediction in non-model species, we found that pathogen-challenged RNA-seq did not substantially improve annotation completeness or immune gene recovery. BRAKER3 based annotations derived from pathogen-challenged, unchallenged, and combined RNA-seq datasets yielded similar numbers of genes, with most orthologous genes consistently annotated across all datasets. While a few genes were unique to a single dataset, these rarely encoded immune-related functions and were often unassigned to orthogroups—suggesting lower confidence or assembly artifacts. These observations suggest that most immune genes have sufficient constitutive baseline expression to provide informative hints for de novo gene prediction tools. Although RNA-seq also indirectly informs gene annotation by revealing new transcript isoforms and splicing events, and increases confidence, it is thus not essential for the identification of novel genes.
By identifying homologs of a curated set of 638 immune-related D. melanogaster genes, we confirmed that all three species shared a conserved core of immune signaling and recognition genes. However, as in previous studies, we observed extensive lineage-specific variation among effector genes, particularly antimicrobial peptides, which were highly variable in presence, copy number, and inducibility [8, 17, 24]. Notably, Scaptodrosophila deflexa exhibited the lowest recovery of immune genes overall and lacked orthologs of several canonical AMPs, including DptA, AttC, and AttD. This pattern echoes previous finding in Scaptodrosophila lebanonensis, where a gene syntenic to DptA was identified but exhibited very little sequence similarity to any diptericin and in subgenus Drosophila it is replaced by DptC [17]. It is therefore plausible that DptA has either been lost or has diverged beyond recognition in S. deflexa, potentially reflecting strong diversifying selection or relaxed constraint. In contrast, H. cameraria and H. confusa both harbored multiple paralogs of DptC and AttC, suggesting lineage-specific expansions. These patterns of gain and loss were supported quantitatively by our Bayesian mixed-effects model, which showed that both species identity and gene functional category significantly predicted immune gene recovery. Effector genes were recovered with slightly, but significantly, lower probabilities than signaling genes, reinforcing the view that AMP gene families are particularly prone to evolutionary turnover. Among the species, S. deflexa showed the lowest predicted probability of immune gene recovery, again driven in part by the absence of some canonical AMP orthologs. These findings highlight the evolutionary lability of AMP gene families in Drosophilidae, characterized by frequent duplication, pseudogenization, and loss—hallmarks of strong, dynamic selective pressures likely imposed by pathogen diversity and host ecology. They also underscore the challenge of identifying fast-evolving immune effectors in non-model organisms, where high divergence can obscure orthology relationships and thereby our inability to functionally annotate these genes.
Interestingly, while both H. cameraria and H. confusa mounted robust transcriptional responses to Providencia rettgeri challenge—upregulating canonical AMPs (such as diptericins, attacins, and cecropins), PGRP receptors, serine proteases, and transcriptional regulators (such as Relish and pirk)—S. deflexa appeared to exhibit minimal transcriptional changes, with only two PGRPs and no known AMPs upregulated. Assuming this was not simply lower power in our experiment, there are two likely explanations. First, S. deflexa individuals showed lower Providencia load, suggesting reduced infection burden or greater resistance. Second, all individuals harbored Spiroplasma, an endosymbiont reported to protect D. melanogaster from pathogenic bacteria (including Providencia) via mechanisms such as iron sequestration and melanization—defense strategies that bypass transcriptional AMP induction [55]. The concurrent presence of Wolbachia might further perturb the host immune responses, potentially by masking or dampening canonical transcriptional responses. In mosquitoes, experimental infection with Wolbachia provides broad-spectrum anti-microbe and parasite protection by upregulating immune effector molecules and those involved in antimicrobial pathway [73–75]. Together, these findings suggest that both reduced Providencia infection success and the presence of defensive endosymbionts—especially Spiroplasma—may explain the weak immune transcriptional response observed in S. deflexa. More broadly, our results underscore the importance of profiling the microbiome when interpreting host transcriptional responses in non-model, wild-derived insects, where natural symbionts and background microbial variation may obscure or reshape immune phenotypes.
Finally, our identification of 41 novel AMP-like genes—many of which are not widely conserved across Drosophilidae, and are short, secreted, and structurally similar to known AMPs—suggests that AMP evolution may be even more dynamic than previously appreciated. Notably, many candidates shared structural features with known AMP families such as apidaecins, holotricins, cecropins, and lysozymes, but lacked clear sequence homology to any annotated D. melanogaster genes. Other candidates were not predicted to be AMPs, but were highly induced—including Hcam/g11853 and its homolog Hconf/g10777, which could represent highly divergent homologs of edin. These results build on previous reports that lineage-specific AMPs are common in Drosophilidae [17, 24, 26] and emphasize that reliance on model species alone likely underestimates the diversity of immune effectors in nature.
While this study provides important insights into the immune responses of non-model drosophilids, several limitations should be considered when interpreting our results. First, the absence of wounding (i.e., sham injection) controls prevents us from separating transcriptional responses to bacterial infection from those triggered by mechanical injury. Second, because of our focus on hard-to-culture species, we could not usefully perform analyses requiring larger sample sizes, such as survival assays or measures of pathogen load. Such experiments could have provided a more direct link between transcriptional responses and infection outcomes. Third, the use of hard-to-culture species also meant that most of our sample flies were siblings, and in the case of S. deflexa, were the offspring of a single wild-caught female. This likely explains why all of the S. deflexa individuals carried Spiroplasma, preventing any analysis of the role of Spiroplasma—even though the experiment is well-replicated in many other ways (e.g., the siblings will be less closely related than flies from a single isofemale or inbred line). Fourth, the flies were reared on non-standard, species-specific diets and carried natural microbial communities, which, while ecologically realistic, may have reduced power by introducing additional variation in baseline immune status and infection susceptibility. Finally, to go further, we would require experimental validation for the novel AMP-like genes identified here. Together, these caveats underscore both the challenges and the value of expanding immune-genomic research into non-model, wild-derived drosophilid lineages.
Conclusions
This study expands the scope of functional immune genomics further into non-model drosophilids and highlights both the conserved and lineage-specific components of insect immunity, including extreme divergence and novel AMPs in distant lineages of Drosophilidae. Our findings advocate for a broader phylogenetic sampling in immunological research and demonstrate the feasibility of high-resolution transcriptomics in species that are not readily amenable to laboratory domestication. In the future, the functional validation of novel immune effectors, through antimicrobial assays or in vivo perturbations, will be critical to understand their roles in host defense. Additionally, as genome and transcriptome data continue to accumulate across Drosophilidae, studies such as this will be essential for understanding how immune systems evolve, diversify, and interact with microbial environments across evolutionary time.
Methods
Fly collection and infection
We collected wild-mated female Hirtodrosophila confusa (n = 2), H. cameraria (n = 3 females), and Scaptodrosophila deflexa (n = 1) from the Hermitage of Braid (Edinburgh, UK, all within 500 m of 55.92 N, 3.20 W) on 4th of August 2023. Wild-caught female flies were housed separately under a 12:12-h light-dark cycle and allowed to lay eggs. Wild-caught H. confusa and H. cameraria were maintained on whole mushrooms (Agaricus bisporus), and S. deflexa on fermentative substrate intended to mimic sap flux conditions: Drosophila medium supplemented with tiny slices of ripe banana and cotton plug heads moistened with cider and maple syrup. Emerging first-generation female flies were collected and housed in same-sex vials containing modified Lewis medium [76] for 3-5 days prior to bacterial challenge. As a result, all flies were between 3 and 6 days old at the time of infection. For bacterial challenge, we used the “Dmel” strain of Providencia rettgeri originally isolated from wild Drosophila melanogaster by Brian Lazzaro [77], and provided to us by Pedro Vale (University of Edinburgh). Providencia rettgeri cultures were initiated from a single colony and grown overnight in 10 ml LB broth at 37 °C with shaking. The bacterial culture was centrifuged at 5000 rpm for 5 min at 4 °C and the supernatant was discarded. Bacteria were diluted to an optical density of OD_600_ = 0.1 in sterile phosphate-buffered saline (PBS) prior to infection [78]. Flies were anesthetized on CO_2_, and bacterial challenge was administered by puncturing the thorax with a 0.14-mm diameter stainless steel pin dipped in the bacterial suspension. Five females per species (four females and one male for S. deflexa) were challenged in this way. Control flies were anesthetized but otherwise left unmanipulated, and thus this experiment did not control for wounding effects. Both challenged and unchallenged flies were maintained at room temperature for 16 h post-infection, then flash-frozen in Trizol reagent (Invitrogen) and stored at −80 °C until RNA extraction. RNA extraction and sequencing were carried out by Novogene (www.novogene.com) using a standard TRIzol-based extraction protocol. Sequencing libraries were prepared using an rRNA depletion approach (without poly-A selection) with a TruSeq stranded total RNA library kit and unique dual-indexing, prior to being sequenced on the Illumina NovaSeq platform to generate 150-bp paired-end reads.
Quality control and mapping
On average, samples contained 58.35 million raw reads per fly. We preprocessed raw reads for quality control using fastp v0.24.0, with the “-c” option enabled for overlap base correction and “-y -Y 20” for low-complexity filtering [79]. All other parameters were kept at their default settings. Reads were mapped to their respective genomes using STAR RNA-seq aligner v2.7.10b, generating sorted BAM files as output [80]. On average, 83.98%, 66.55%, and 82.02% of reads per library mapped uniquely to the H. confusa (assembly accession: GCA_035043065.1), H. cameraria (GCA_949708635.1), and S. deflexa genomes, respectively (Additional file 9). The Scaptodrosophila deflexa genome was sequenced by Bernard Kim (Princeton) and Dmitri Petrov (Stanford), using ONT R10.4.1 sequencing technology and assembled with hifiasm [81], and was generously made available to us in advance of publication. BUSCO completeness and N50 for all the genomes used in this study can be found in Additional file 9.
Gene annotation
To assess the impact of pathogen-challenged RNA-seq on our ability to recover immune genes, as compared with unchallenged RNA-seq, we generated 3 independent genome annotations for each species using: (1) RNA-seq data from pathogen-challenged individuals, (2) RNA-seq data from unchallenged (naïve) individuals, and (3) combined RNA-seq data. Genome annotations were generated using BRAKER3 in ETP mode, with extrinsic protein hints provided from D. melanogaster RefSeq proteins [38]. To assess gene orthology and recover immune-related orthogroups, we extracted the longest isoform per gene from each annotation set and ran OrthoFinder v2.5.5 using the predicted proteomes of all species and annotation sets, along with the D. melanogaster proteome [82]. To generate hierarchical orthogroups (HOGs), gene trees were inferred by OrthoFinder using alignments generated with MAFFT [83] and maximum-likelihood inference performed with IQ-TREE2 [84]. A curated list of D. melanogaster immune-related genes compiled from the literature was then used to identify corresponding immunity-related HOGs (following [85]), enabling direct comparison of immune gene recovery across the species analyzed in this study [25, 86–91].
Differential gene expression analysis and functional annotation
Read counts per gene were quantified using “featureCounts” [92]. Genes with fewer than 10 total counts across all samples were excluded from downstream analysis. Differential expression analysis was conducted using the DESeq2 package in R [42]. We performed principal component analysis (PCA) using the “plotPCA()”* function on regularized log-transformed (rlog) counts, with ~treatment* specified as the design formula. The top 500 most variable genes were used to calculate principal components. Genes were considered significantly differentially expressed if they had an adjusted p value <0.05 and a |log_2_ fold change| ≥1. Heatmaps of expression patterns were generated using the pheatmap package [93].
Functional annotation of genes was performed using eggNOG-mapper v2.1.12 with Diptera HMM database using HMMER searches (-m hmmer) and additional filters for hits with e-value ≤0.05, bit score ≥60, and percent identity ≥40 [94, 95]. GO terms were restricted to non-electronic evidence codes, and PFAM domains were realigned (–pfam_realign realign). Gene Ontology (GO) enrichment analysis was performed using “topGO” package in R by applying Fisher’s exact test and the “weight01” algorithm, which accounts for the hierarchical structure of GO terms [96]. The p values from GO analyses were corrected using the Benjamini and Hochberg procedure with the FDR threshold set to 0.05 [97].
Metagenomic analysis of unmapped reads
To quantify microbial abundance in flies and to assess whether pathogen-induced expression differences could be affected by microbial load, we performed de novo metatranscriptomic analysis on unmapped reads from each sample. Paired-end unmapped reads were extracted from STAR-mapped BAM files using “samtools” v1.13 with the flags -f 12 -F 256, to retain only unmapped read pairs [98]. These reads were assembled de novo using “rnaSPAdes” v4.1.0 [99], with default parameters. Open reading frames (ORFs) were predicted from each transcriptome assembly using EMBOSS “getorf,” retaining protein length ≥ 200 amino acids. The resulting proteins were queried against the NCBI non-redundant (nr) protein database using “diamond blastp” v2.1.10 with ‘–evalue 1e-20’ [49] and taxonomic lineages were assigned to diamond hits using “TaxonKit” [100]. To estimate relative microbial load, total reads mapped to each organism were normalized to host-mapped read counts for each sample to control for sequencing depth. Alpha diversity (Shannon index) was calculated from genus-level profiles using the vegan and phyloseq packages in R [101, 102]. Contigs annotated as Spiroplasma by “diamond blastp” were identified for each sample and counts of reads mapping to these contigs were summed per sample. Counts were normalized to host-mapped reads to yield a metric of Spiroplasma RNA (combining titer and expression level) relative to total fly expression. Statistical comparisons between pathogen-challenged and unchallenged groups were performed using the Wilcoxon rank-sum test.
To build a Spiroplasma phylogeny, we identified orthologs of eight single-copy Spiroplasma genes in each Spiroplasma genome (following [103]). For S. deflexa, open reading frames (ORFs) matching these loci were obtained from assemblies of unmapped RNA-seq reads and confirmed by blastp against the NCBI non-redundant database. Individual protein alignments were generated with MAFFT and concatenated into a supermatrix alignment [83]. The final alignment was used to infer a maximum-likelihood phylogeny using IQ-TREE2 under the LG + F + I + R5 model, with 1000 ultrafast bootstrap replicates to assess node support [84].
Prediction of novel AMPs
To identify potential novel immune effectors, we screened all significantly upregulated genes that had no detectable homologs in D. melanogaster and a predicted peptide length ≤ 200 amino acids. These candidates were assessed using two AMP prediction tools: (1) AMP Scanner v2, a deep neural network-based classifier [65], and (2) amPEPpy, a Python implementation of the amPEP random forest classifier [66]. The random forest model was trained using 712 known AMPs from the APD3 database and 712 matched non-AMP sequences (see original publication for dataset details: [65]). All candidate peptides were also analyzed for the presence of N-terminal signal peptides using SignalP 6.0 webserver [67]. Physiochemical properties of candidates were obtained using AMP predictor tool of APD3 [104] and ExPASy “ProtParam” [105]. Finally, predicted 3-D structures of AMP candidates were generated using the AlphaFold 3 server [68].
Supplementary Information
Additional file 1: Dot plot of GO enriched terms of genes uniquely annotated from pathogen-challenged or unchallenged RNA-seq datasets.Additional file 2: Presence-absence of immune genes in Hirtodrosophila cameraria, H. confusa, and *Scaptodrosophila deflexa.*Additional file 3: Maximum-likelihood gene tree of attacin genes from 51 drosophilid species, including Hirtodrosophila cameraria, H. confusa, and Scaptodrosophila deflexa, generated using IQ-TREE2 based on aligned amino acid sequences. The gene tree highlights three distinct clades corresponding to AttA/B (orange), AttC (magenta), and AttD (sky blue). AttD was missing from S. deflexa genome.Additional file 4: List of differentially expressed of genes in *Hirtodrosophila cameraria *(Sheet 1), *H. confusa *(Sheet 2), and *Scaptodrosophila deflexa *(Sheet 3) after bacterial challenge.Additional file 5: Dispersion estimates plotted against mean expression levels. (A) Hirtodrosophila cameraria, (B) H. confusa,and (C) *Scaptodrosophila deflexa.*Additional file 6: Dot plot of GO enriched terms of genes significantly upregulated in *Hirtodrosophila cameraria *(Sheet 1), *H. confusa *(Sheet 2), and *Scaptodrosophila deflexa *(Sheet 3) after bacterial challenge.Additional file 7: Microbiome composition in pathogen-challenged and unchallenged samples across three drosophilid species. (A) Stacked bar plot showing the relative abundance of the top 14 viruses (measured as reads per million host-mapped reads) across all samples. The 14 genera represent the union of the top 10 most abundant genera from each sample. Samples are ordered by species (H. cameraria, H. confusa, S. deflexa) and are color-coded by treatment status: pathogen-challenged (red) and unchallenged (blue). (B) Boxplot summarizing the relative abundance of each viral genus across all samples. Each box represents the distribution of abundance for a given genus, with individual data points indicating values from individual samples. Genera are ordered by median abundance.Additional file 8: List of 41 candidate AMPs and their properties.Additional file 9: Read mapping summaries (Sheet 1) and genome assembly statistics (Sheet 2) for H. cameraria, H. confusa, and S. deflexa.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Drosophila 12 Genomes C, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al: Evolution of genes and genomes on the Drosophila phylogeny. Nature 2007, 450(7167):203–218.10.1038/nature 0634117994087 · doi ↗ · pubmed ↗
- 2Suvorov A, Kim BY, Wang J, Armstrong EE, Peede D, D’Agostino ERR et al. Widespread introgression across a phylogeny of 155 Drosophila genomes. Curr Biol. 2022;32(1):111–23 e 115.10.1016/j.cub.2021.10.052PMC 875246934788634 · doi ↗ · pubmed ↗
- 3Finet C, Kassner VA, Carvalho AB, Chung H, Day JP, Day S, et al. Droso Phyla: resources for drosophilid phylogeny and systematics. Genome Biol Evol. 2021. 10.1093/gbe/evab 179.10.1093/gbe/evab 179PMC 838268134343293 · doi ↗ · pubmed ↗
- 4Shakya M, Lo C-C, Chain PSG. Advances and challenges in metatranscriptomic analysis. Frontiers in Genetics 2019. 10.3389/fgene.2019.00904.10.3389/fgene.2019.00904 PMC 677426931608125 · doi ↗ · pubmed ↗
- 5Mc Mullen JG, Bueno E, Blow F, Douglas AE. Genome-inferred correspondence between phylogeny and metabolic traits in the wild Drosophila gut microbiome. Genome Biol Evol. 2021. 10.1093/gbe/evab 127.10.1093/gbe/evab 127PMC 835822334081101 · doi ↗ · pubmed ↗
- 6Tian Y, Yue X, Jiao R, Hanson MA, Lemaitre B: Functional characterization of paillotin: an immune peptide regulated by the Imd pathway with pathogen-specific roles in Drosophila immunity. In.: Cold Spring Harbor Laboratory; 2025.10.1098/rspb.2025.1835 PMC 1241990140925565 · doi ↗ · pubmed ↗
- 7Bloomington Drosophila Stock Center. Indiana University Bloomington. https://bdsc.indiana.edu/information/recipes/bloomfood.html. Accessed 20 Jan 2024.
- 8Kolde R. Pheatmap: pretty heatmaps. R package version 1.0.13. 2025. https://github.com/raivokolde/pheatmap.
