Evolution of Highly Repetitive Silk Genes in the Luna Moth, Actias luna
Bert Foquet, Lauren E Eccles, Amanda Markee, Deborah A Triant, Paul B Frandsen, Whitney L Stoppel, Akito Y Kawahara

TL;DR
This paper studies the evolution of silk genes in the Luna moth, showing how gene duplications lead to diverse silk proteins and adaptive traits.
Contribution
The study provides the first detailed molecular characterization of sericin genes in the Luna moth and reveals convergent subfunctionalization in silk gene evolution.
Findings
Eight sericin genes were identified in the Luna moth with two clusters of closely related paralogs.
Sericin genes show variation in repeat number, amino acid composition, and life stage-specific expression.
Comparisons with other moths reveal convergent subfunctionalization in sericin gene evolution.
Abstract
Gene duplications are a major driver of molecular diversification and phenotypic evolution. Arthropod silk genes provide an excellent model for studying these processes due to their extensive internal repeats and rapid evolutionary rates. In Lepidoptera, the Fibroin heavy chain (fibH) gene encodes the primary structural protein for silk fibers, contributing largely to their mechanical strength. This inner fibroin core is surrounded by an outer coating composed primarily of sericins. Sericins are a group of highly repetitive, serine-rich proteins that modulate silk fiber properties. Although sericins in the domestic silkworm (Bombyx mori) have been associated with life stage-specific variation in silk characteristics, their evolution and function remain poorly understood. Here, we provide a detailed molecular characterization of sericin genes in the Luna moth (Actias luna) known for…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5| SerA | Ser1 | SerB | SerC | SerD | SerE | SerF | SerG | |
|---|---|---|---|---|---|---|---|---|
| Location method | SS, SM | SS, SM | GP | SS, SM | GP | SS, SM | SS, SM | SS, SM |
| Contig | ptg000025l | ptg000038l | ptg000081l | ptg000081l | ptg000081l | ptg000081l | ptg000081l | ptg000081l |
| Number of Exons | 8 | 8 | 7 | 4 | 19 | 3 | 3 | 3 |
| Protein length (aa) | 2031 | 3039 | 966 | 260 | 2458 | 1457 | 2765 | 2099 |
| Protein MW (kDa) | 192.80 | 299.89 | 95.31 | 25.86 | 250.02 | 142.59 | 266.63 | 203.62 |
| Protein IP | 3.01 | 6.08 | 3.87 | 3.04 | 3.27 | 5.32 | 7.15 | 5.24 |
| Repeat number | R1: 19 | R1: 13 | R1: 9 | 89 | 21 | 37 | 65 | 47 |
| Repeat length (aa) | R1: 64–76 R2: 41–45 | R1: 105–114 R2: 51 | R1: 13–15 R2: 15 | 21 | 8 | 38 | 38 | 38 |
| Silk gland expression | Yes | Yes | Yes | Yes | Yes | Unknown | Unknown | Unknown |
- —Integrative Biology Award from the Molecular and Cellular Biology Division of the National Science Foundation
- —University of Florida10.13039/100007698
- —National Institutes of Health National Institute of General Medical Sciences Maximizing Investigators’ Research Award
- —National Science Foundation10.13039/501100008982
- —National Institutes of Health10.13039/100000002
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSilk-based biomaterials and applications · Silkworms and Sericulture Research · Neurobiology and Insect Physiology Research
Introduction
Genomic change is a fundamental engine of biological diversity, providing the raw material for evolutionary innovation across the Tree of Life. Among the most influential of these processes is gene duplication, which is widely recognized as a primary driver of phenotypic plasticity. Gene duplications generate redundant gene copies that can evolve without compromising their original role (Ding et al. 2012; Birchler and Yang 2022; Kuzmin et al. 2022). Although duplicated genes are often lost, others are retained and may acquire novel functions (neofunctionalization), partition ancestral roles (subfunctionalization), increase gene dosage, or act as compensatory backups (Birchler and Yang 2022; Kuzmin et al. 2022). These mechanisms have contributed significantly to shape major evolutionary innovations, from the diversification of animal body plans (Wagner et al. 2003) and variation in human traits (Conrad and Antonarakis 2007), to the emergence of silk fiber diversity in spiders (Garb et al. 2010). While gene duplications can arise from nonhomologous recombination repair (Koszul et al. 2004) and transposition events (Hughes et al. 2003; Cerbin and Jiang 2018), most occur in regions rich in repetitive DNA (Babcock et al. 2003; Dennis et al. 2017; Delihas 2020). In these regions, nonallelic homologous recombination (Taylor et al. 1957) and unequal crossing over during meiosis (Smithies 1964) are the predominant molecular mechanisms that give rise to duplications. These processes can also act within individual repeat-rich genes, accelerating their evolution and leading to the development of novel traits (Cheng and Chen 1999; Delihas 2011; King 2024). Despite their evolutionary importance, such genes have been historically difficult to study due to their complex structure. However, advances in long-read sequencing technologies now make it possible to resolve these regions with high accuracy at the genomic level (Krsticevic et al. 2015; Hotaling et al. 2021; Kawahara et al. 2022).
Arthropod silk genes, characterized by exceptionally high internal repeat content, have emerged as powerful models for investigating the evolutionary dynamics of repetitive, protein-coding DNA. The silk fibers produced by these genes have captivated human societies for millennia due to their remarkable mechanical properties, which uniquely combine high tensile strength with extensibility. These properties arise from the semicrystalline organization of silk fibers and the underlying long, repetitive amino acid sequences of the silk proteins, providing a direct link between gene structure, protein architecture, and material performance (Sehnal and Sutherland 2008; Aikman et al. 2025). Major silk proteins, including fibroins in butterflies, moths, and caddisflies (Amphiesmenoptera) and spidroins in spiders (Araneae), have convergently evolved highly repetitive sequences rich in alanine, glycine, and serine (Gatesy et al. 2001; Sutherland et al. 2010; Walker et al. 2012; McKim and Turner 2024). The spidroin gene family has undergone extensive gene duplication, shifts in expression, and diversification of its repetitive domains, collectively enabling the unparalleled mechanical diversity of spider silk (Starrett et al. 2012; Clarke et al. 2017; Kono et al. 2019). In contrast, duplications of the Fibroin heavy chain (fibH) gene—the primary silk gene in Lepidoptera and Trichoptera (Yonemura et al. 2009; Heckenhauer et al. 2023; Zhang et al. 2024)—are rare. This suggests that in Lepidoptera, other genetic mechanisms might contribute to previously reported functional and structural silk diversity (Peng et al. 2019; Guo et al. 2022; Eccles et al. 2025).
Lepidopteran silk fibers consist of an inner core, consisting primarily of the fibH protein, and an outer coating. This outer coating makes up 20% to 50% of the silk fiber and functions as an adhesive and protective layer influencing fiber structure, stability, adhesion, and mechanics (Gheysens et al. 2011; Dong et al. 2013; Malay et al. 2016; Takasu et al. 2017; Peng et al. 2019; Guo et al. 2022; Wu et al. 2022). Sericins, a group of highly variable, serine-rich and repetitive proteins, make up the largest portion of this outer coating (Dong et al. 2013; Guo et al. 2022, 2025; Rouhová et al. 2024). In the domestic silkworm, Bombyx mori, six different sericin genes have been described. Only three of these encode proteins that are present in cocoon silk, while the other three encoded proteins are found in silks produced during early larval stages (Dong et al. 2013, 2019; Peng et al. 2019; Guo et al. 2022; Wu et al. 2024). These stage-specific expression patterns are correlated with variation in silk mechanics and properties across developmental stages (Kludkiewicz et al. 2009; Takasu et al. 2010; Guo et al. 2022), suggesting that the sericin-rich outer coating, rather than the fibroin core, determines the functional diversity of silk. Comparative studies of sericins, both between and within species, are complicated by their high sequence divergence and are rare outside B. mori (Tsubota et al. 2021; Wu et al. 2022; Kmet et al. 2023), leaving the evolutionary history of this gene family poorly resolved. As a result, it remains unclear whether other lepidopteran species share the sericin expression patterns observed in B. mori.
The moth family Saturniidae is thought to have diverged from Bombycidae (which includes B. mori) ∼70 million years ago (Kawahara et al. 2019). Saturniids have long attracted interest as alternatives to B. mori due to their large body size and robust cocoons, which are the source of several commercially and culturally important wild silks, including tussah, muga, and eri. While the silk properties of saturniids are relatively well characterized (Holland et al. 2012; Reddy and Yang 2012; Malay et al. 2016; Schmidt et al. 2023) and silk genes have been identified in several species (Tsubota et al. 2016; Žurovec et al. 2016; Rouhová et al. 2024), little is known about the genomic organization, variability, and evolutionary history of sericins in saturniids and their relatives.
The Luna moth (Actias luna), a large and iconic North American saturniid, produces silk throughout multiple larval stages, including anchoring silk, molting pads, and a strong silk cocoon for protection (Chen et al. 2012a; Reddy and Yang 2012; Eccles et al. 2025). A recently published, high-quality genome assembly (Markee et al. 2024), along with a detailed analysis of its silk structure (Eccles et al. 2025), provides the framework needed to characterize the structure, expression, and evolution of sericin genes in a nonmodel lineage. In this study, we identified and extracted eight sericin genes in A. luna, examined their genomic locations, and compared its encoded protein sequences to previously identified sericins from eight other Bombycoidea: Actias selene, Antheraea assamensis, Antheraea pernyi, Antheraea yamamai, B. mori, Hyalophora cecropia, Rhodinia newara, and Samia ricini. We also examined changes in sericin gene expression across developmental stages and assessed variation in predicted protein composition among the A. luna sericins.
Results
Identification of Sericins in the A. luna Genome
We applied three different methods to locate potential sericin genes in the A. luna genome: sequence similarity with known sericins, shared sequence motifs with known sericins, and genetic proximity to other A. luna sericins. First, we generated a sericin dataset containing 22 previously characterized sericins from eight species of the superfamily Bombycoidea: seven species of Saturniidae (Actias selene, Antheraea assamensis, A. pernyi, A. yamamai, H. cecropia, R. newara, and S. ricini) and B. mori, a member of the closely related Bombycidae (Takasu et al. 2007; Dong et al. 2015 , 2019; Tsubota et al. 2016; Žurovec et al. 2016; Guo et al. 2022; Rouhová et al. 2024) (Table S1). This dataset was used to query the A. luna genome for sericins with a Basic Local Alignment Search Tool (BLAST) search (Altschul et al. 1990; Camacho et al. 2009). Second, the 22 sericins in the sericin dataset were aligned to generate a set of conserved sequence motifs, which was used by nhmmer (Wheeler and Eddy 2013) to further unveil luna moth sericin genes. Both methods yielded the same set of six sericins (Table 1). Two more putative sericins were identified by a targeted search for sericin-like genes in genomic proximity to these six sericins (Table 1). Genes were subsequently manually annotated using available short-read RNA-sequencing data, ISOseq RNA-sequencing data, and pre-existing automated genomic annotation performed by BRAKER3 (Gabriel et al. 2024; Markee et al. 2024). This improved the automated annotation for seven out of eight sericins, as only serD were previously correctly annotated. The eight putative A. luna sericin genes share a set of structural characteristics previously described from sericins (Garel et al. 1997; Takasu et al. 2007; Kludkiewicz et al. 2009; Žurovec et al. 2016; Dong et al. 2019; Guo et al. 2022; Wu et al. 2024): they have at least two small exons (<50 bp) at the start of the gene that together code for a signal peptide, one large exon making up more than 70% of the coding sequence located near the end of the gene, and large numbers of serine-rich repeats (Table 1 and Fig. 1). To avoid any assumption of homology with B. mori sericin 1-5, these eight A. luna sericins were given letter codes based on their order in the genome (sericin A-G; serA-G), except for A. luna sericin 1 (ser1) which we found to be an ortholog of B. mori sericin 1 (Table 1). Silk gland-specific expression was confirmed for five of the eight A. luna sericins (ser1, serA-D), based on a long-read transcriptome generated from silk glands extracted across different life stages (Table 1).
Comparison of sericin gene locations among four moth species. (a and c) BUSCO-derived ChromSyn synteny plots for B. mori chromosomes 8 (a) and 11 (c). Light blue and light red lines connecting chromosomes represent synteny blocks of BUSCO genes, with blue indicating the same strand orientation and red indicating inversions. Contig names were retained from the respective NCBI assemblies; an “R” denotes contigs reversed in orientation. Filled black circles indicate predicted telomeres. Filled diamonds mark sericin gene locations, color-coded based on their genomic location. Names of A. luna sericins are listed in (b) and (d), while sericin names for other species are listed on the plot. (b and d) Schematic diagram of A. luna sericin genes on ptg000025l (b) or ptg000038l and ptg000081l (d), retaining the original chromosome coordinates. Wide black and blue boxes represent coding sequences; introns are shown as narrow light gray boxes. Alternative exons (if present) are shown as wide dark gray boxes and untranslated terminal regions are marked in red. Primary and secondary repeat regions are shown in dark blue and light blue, respectively. Arrows in introns depict the direction of transcription. Chr, chromosome; srp, serine-rich protein.
Comparison of Genomic Locations of Sericin Genes in Bombycoidea
To assess how sericins are related to each other, both within and across species in the superfamily Bombycoidea, which contains Bombycidae and Saturniidae, we first evaluated whether sericin gene locations are conserved across species. A genome-wide synteny map of four different genomes (Fig. S1), including A. luna, two saturniid model species (A. yamamai and S. ricini), and the domestic silkworm (B. mori), was generated using BUSCO genes; a set of highly conserved, single-copy genes (Edwards et al. 2022). We subsequently mapped the A. luna sericins identified in the current study and previously identified sericins for the remaining three species (Takasu et al. 2007; Dong et al. 2015, 2019; Tsubota et al. 2016; Žurovec et al. 2016; Guo et al. 2022) (Table S1) to their respective genomes. Actias luna sericin genes were located on three different contigs (Table 1 and Fig. 1a and c), but two of these (ptg000038l and ptg000081l) share synteny blocks of conserved genes with A. yamamai chromosome (Chr) 29, S. ricini Chr 11, and B. mori Chr 11 (Figs. 1c and S1). Although chromosome rearrangements are frequent among the four species included in our synteny plot (A. luna, A. yamamai, B. mori, and S. ricini; Fig. S1), chromosome fission is rare in Lepidoptera (Wright et al. 2024). We thus hypothesize that A. luna contigs ptg00081l and ptg000038l are derived from the same chromosome that contains seven out of the eight A. luna sericin genes.
Ptg000025l contains a single sericin gene, which we named sericin A (serA). SerA consists of seven short exons, followed by one large exon that contains two different repeat sequences (Fig. 1d and Table 1). There were no previously characterized sericins in the orthologous chromosome 8 in A. yamamai, S. ricini, or B. mori, and we were not able to find any in the current study using a BLAST (Altschul et al. 1990; Camacho et al. 2009) search (Fig. 1c). Sericin 1 (ser1), located on ptg000038l, is the largest A. luna sericin and contains eight different exons, two of which are alternative exons that are not always present (Table 1 and Fig. 1d). Its longest exon contains two different sets of repeats (Table 1 and Fig. 1d). Its genomic location is highly conserved in all four species included in the synteny plot (Fig. 1c), and possibly within the superfamily Bombycoidea.
The other six sericins (serB-G) are found in a 1.5 Mb-long region on ptg000081 in two distinct gene clusters (Fig. 1). The first of these gene clusters, spread across almost 105 kb, contains serB-D (Fig. 1). These three sericins vary strongly in the number of exons (4 to 19), repeat number (21 to 89), and total length (241 to 1439 amino acids) (Table 1 and Fig. 1d). The same genomic region contained previously identified sericins in S. ricini (Srp1, Unigene1738) and B. mori (ser5), but lacked known sericins in A. yamamai (Fig. 1c). The second gene cluster, consisting of serE-G, covers an even smaller genomic region of under 55 kb. SerE-G all have three exons and a 38 amino acid repeat in their longest and final exon (Table 1 and Fig. 1d). The same genomic region contained previously identified sericins for the two other included saturniid species, A. yamamai (Src2-5) and S. ricini (srp4,5), but not for B. mori (Fig. 1a). The remaining B. mori sericins, ser2-4, were located in a more distant region on the same chromosome, which lacked sericins in the three saturniid species (Fig. 1c).
Grouping of Sericins based on Protein Similarities
To further characterize relationships between saturniid sericins, a phylogenetic network was generated based on pairwise protein identities, including all translated A. luna sericins identified in the current study and the translated sericin sequences of previously identified sericins from Bombycoidea described under “Identification of sericins in the Actias luna genome” (Table S1). Based on protein-level sequence similarity and synteny across nine Bombycoidea (A. luna, A. selene, A. assamensis, A. pernyi, A. yamamai, B. mori, H. cecropia, R. newara, and S. ricini), we identify four distinct sericin groups within Saturniidae (Fig. 2).
Sericin phylogenetic network. A phylogenetic network based on pairwise distances between sericin protein sequences was generated with SplitsTree. Only splits with a bootstrap value over 10 were shown. Sericins were color-matched with colors used in Fig. 1 where applicable. Sericin groups are represented by dotted lines. Aas, Antheraea assamensis; Alu, Actias luna; Ase, Actias selene; Ape, Antheraea pernyi; Aya, Antheraea yamamai; Hce, Hyalophora cecropia; Bmo, Bombyx mori; Rne, Rhodinia newara; Sri, Samia ricini; srp, serine-rich protein; ug, unigene.
The first group contained six sericins, each from a different species that clustered together with high bootstrap support for the saturniid sequences (Fig. 2). This group included B. mori ser1 and the three sericin proteins encoded by saturniid genes located in the same genomic location as B. mori ser1 (Alu_ser1, Aya_src1 and Sri_unigene3618; Fig. 1). Each sericin protein within this cluster contained a conserved CXCX motif at the N-terminus (Fig. S2), representative of sericin 1 sequences across Lepidoptera (Wu et al. 2022). We thus named this group the Sericin 1 Group (Fig. 2). The second group, again characterized by high bootstrap support, contained three B. mori sericins (ser2,4,5) and one S. ricini sericin (Fig. 2). Interestingly, the B. mori sequences did not cluster according to their genomic location, with ser2 clustering closer to ser5 rather than ser4 (Figs. 1 and 2). Because no single B. mori sericin could be unambiguously assigned as the representative of this clade, we refer to this lineage simply as Group 2.
The third group was well represented in almost every saturniid taxon included in our study and exhibits high bootstrap support, but did not contain B. mori sericin sequences (Fig. 2). As such, we referred to this group as Group 3, which was additionally characterized by a conserved 38-amino acid repeat motif shared by each sericin in the group (Table 1) (Žurovec et al. 2016). The A. luna sericins in this group (SerE-G) were highly similar at the protein level (Fig. 2 and Table S2), despite considerable variation in the number of repeats (Table 1). The A. yamamai genome also showed an expansion of Group 3, with at least four distinct members (Src2-5) (Žurovec et al. 2016) that were less conserved, characterized by their dispersed clustering (Fig. 2 and Table S2). Although S. ricini serine-rich proteins (srp) 4 and 5 both shared their genomic location with other Group 3 sericins (Fig. 1), only the latter (srp5) exhibited the 38-amino acid motif, while the former (srp4) was only distantly related to other Group 3 sequences (Fig. 2). Interestingly, S. ricini srp4 clustered the closest to B. mori ser3 (Fig. 2), despite the gene coding for B. mori ser3 being in a different genomic region than S. ricini srp4 (Fig. 1). The evolutionary relationship between S. ricini srp4, B. mori ser3, and saturniid Group 3 sericins is still unresolved.
The remaining sericins were only loosely grouped and their splits exhibited low bootstrap support (Fig. 2), even though most were found in a similar genomic location with B. mori ser5 and S. ricini unigene1738 (Fig. 1). Based on their shared genomic location, we classified A. luna serB-D into Group 4. We included A. luna serA and A. serene unigene3219 and unigene3639 in Group 4 due to their protein-level similarities to A. luna serB (Fig. 2 and Table S2). Interestingly, serD was more closely related to serA than to serB or serC, even though it shared its genomic location with the latter two (Figs. 1 and 2). Despite being located on different contigs, orthologs of serA and serD are also present in A. selene but have not been reported from any species other than the genus Actias. Moreover, a targeted search of the genomic region containing serA in A. luna revealed no sericin-like genes in the corresponding regions of S. ricini, A. yamamai, or B. mori.
Differential Expression of A. luna Sericins Across Life Stages
Previously generated RNA-sequencing data (Markee et al. 2024) were used to examine whether sericin gene expression varies between life stages. We compared first, fourth, and last (fifth) instar caterpillars. Although the data are not silk gland-specific, sericin gene expression outside of the silk glands is very low (Dong et al. 2019; Wu et al. 2024) or undetectable (Žurovec et al. 2016; Guo et al. 2022; Wu et al. 2022). We observed two distinct expression patterns among A. luna sericin genes. Group 4 sericins (Fig. 2; serA-D) exhibited high expression levels in at least one of the two earlier life stages and their expression levels in the last instar (L5) dropped to close to zero. In contrast, ser1 and the Group 3 sericins (Fig. 2; serE-G) had low expression levels across life stages and showed a trend towards higher expression levels in the last instar (L5). Specifically, each Group 4 sericin (Fig. 2; serA-D) exhibited significantly higher expression levels in the first instar (L1) compared with the last instar (L5) (Fig. 3). For serB and serD, expression levels declined markedly from the first to fourth instar. In contrast, serA showed peak expression in the fourth instar and serC exhibited a similar upward trend (Fig. 3). The expression levels of serG were negligible in each sample, but ser1, serE, and serG reached their highest expression levels in an L5 caterpillar that also exhibited high expression levels for FibH (Fig. 3). Of these, serE was the only one for which gene expression levels in L5 were significantly elevated compared with L1, even though this result appeared to be driven largely by a single individual.
*Sericin and FibH gene expression during Luna moth development. RNA-sequencing read counts were obtained from either the whole body (first instar; L1) or the abdomen (fourth instar; L4 and fifth instar; L5). Expression levels of serF were negligible and are not shown here. Sericin names were color-matched with Fig. 1 and group names were adapted from Fig. 2. The FibH gene was previously identified by Markee et al. (2024). Significance levels are denoted by *, *, and *** (respectively P < 0.05, P < 0.01, and P < 0.001).
Protein Composition and Repeat Motifs Across A. luna Sericins
To explore potential functional differences among sericins, we compared the amino acid composition and the repeat sequences for the eight predicted A. luna sericin proteins. Most repeats were rich in serine and threonine (Fig. 4), which, respectively, comprised 22% to 31.2% and 14.9% to 34.6% of total residues. In contrast, in serA, serine accounted for 59% of total residues due to extended serine stretches, but a corresponding decrease in threonine balanced the overall serine/threonine ratio and kept it comparable to the other A. luna sericins. Based on a combination of their amino acid proportions and their expression levels across life stages, A. luna sericins can be functionally separated. Group 4 sericins (serA-D; Fig. 2), that exhibit high expression levels at earlier instars (Fig. 3), contain high levels of acidic residues (aspartic acid: 5.8% to 22%, glutamic acid: 2.1% to 5.6%) but low levels of basic amino acids (combined: 1.6% to 3.8%). Their repeats contain long stretches of serine and threonine residues, interspersed with aspartic acid, glutamic acid, alanine, and, to a lesser extent, valine residues (Fig. 4a). SerB additionally exhibited elevated levels of proline. In contrast, the remaining sericins—ser1 and the three Group 3 sericins (serE-G)—reach their highest expression levels in the final instars and contain high levels of basic residues (>6.5%), glycine (13.2% to 20.5%), and uncharged polar residues (9.0% to 17.3%), but low levels of acidic residues (5.9% to 7.4%). Actias luna ser1 is defined by two sets of repeats, both of which are proline-rich and contain a recurring motif consisting of a glycine followed by five serine or threonine residues. The first repeat contains two stretches that are high in histidine, glycine, and proline residues, while the second repeat is high in tyrosine, aspartic acid, and arginine (Fig. 4a). Finally, the three Group 3 sericins (serE-G) contain high levels of tyrosine, particularly within a recurring S/TYTS motif (Fig. 4a). The differences in basic and acidic residues among A. luna sericins are also reflected in their pI, which is the lowest for Group 4 sericins (Table 1).
Protein composition of A. luna sericins. a) Repeat sequence motifs for each A. luna sericin, separated per repeat where applicable. Amino acids are colored based on their polarity (blue: basic, red: acidic, green: polar, yellow: nonpolar). Sericins are ordered based on when they reached the highest expression level (Fig. 3), their names were color-matched with Fig. 1, and groupings based on Fig. 2 were marked in the legend. The number of repeats is listed below the name and repeat number. b) Amino acid composition of A. luna sericins. Only abundant amino acids or amino acids that strongly varied between sericins are shown. Sericins were ordered and colored as in (a).
Discussion
In this study, we investigated the evolutionary history and molecular diversification of sericin genes in the Luna moth, A. luna, with the goal of clarifying how gene duplication and repeat expansion contribute broadly to functional variation in lepidopteran silk. By integrating comparative genomics and expression analyses, we identified a diverse repertoire of sericin paralogs in A. luna that differ markedly in sequence architecture, repeat composition, and developmental regulation. In the discussion that follows, we focus on three central findings: the unexpectedly large repertoire of at least eight putative sericin genes in the A. luna genome, the role of gene duplication in giving rise to four distinct sericin groups, and evidence for functional specialization among A. luna sericins as reflected in their sequence composition and life stage-specific expression patterns.
The A. luna Genome Contains at Least Eight Putative Sericins
Our study identifies A. luna as one of the species with the highest number of characterized sericins to date, currently surpassed only by Galleria mellonella (Pyralidae) for which at least 12 sericin-like proteins were identified (Wu et al. 2022). Additionally, two of these sericins (ser1 and serB) code for multiple isoforms. Sericin isoforms were similarly reported for B. mori ser1 and ser2, and even though they might have different temporal and spatial expression levels, it is unclear whether they exert different functions (Couble et al. 1983; Michaille et al. 1990; Garel et al. 1997). Each of the eight sericins identified here matches structural characteristics that appear to be a defining feature of this gene family (Takasu et al. 2007; Kludkiewicz et al. 2009; Žurovec et al. 2016; Dong et al. 2019; Guo et al. 2022; Wu et al. 2024): two short initial exons that together encode the signal peptide and one long exon that contains a large number of serine-rich repeats (Fig. 1). Additionally, each sericin is either expressed in our long-read, silk gland-specific transcriptome (ser1, serA-D; Table 1), or orthologous to genes found in other species to be major components of cocoon silk—serE-G are orthologs of A. yamamai Srn2-5 and H. cecropia src2 (Figs. 1 and 2) (Žurovec et al. 2016; Rouhová et al. 2024). The lack of serE-G transcripts in our ISOseq-based silk gland transcriptome is likely due to sampling constraints (eg absence of prepupal caterpillars) and conservative read filtering (Table S3), rather than genuine absence. The much lower number of reported sericins in some of the seven other saturniid moth species is likely due to incomplete gene characterization rather than true absence of these genes. For instance, most studies identifying sericins focus primarily on cocoon silk and/or late larval instars, potentially missing sericins expressed during earlier larval instars (Dong et al. 2015; Tsubota et al. 2016; Žurovec et al. 2016; Rouhová et al. 2024).
Gene Duplications Led to the Origin of Four Different Sericin Groups
Due to their high repeat content (Fig. 1), the resulting fast evolutionary rates (Cheng and Chen 1999; Babcock et al. 2003; Delihas 2011; King 2024), and the highly biased amino acid content (Fig. 4), assessing evolutionary relationships among sericins remains a challenge (Tsubota et al. 2021; Wu et al. 2022; Kmet et al. 2023). Based on genomic locations and pairwise sequence similarities at the protein level, we were able to categorize the saturniid sericins into four groups (Figs. 1 and 2). Although genomic locations (Fig. 1) and protein similarities (Fig. 2) are generally congruent among the sericins included in our study, there are a few discrepancies. For instance, A. luna serB-D of Group 4 are found in one chromosomal gene cluster (Fig. 1c), but relationships between their protein sequences are poorly resolved, and not well supported with branch support, as indicated by low bootstrap values (Fig. 2). Additionally, these genes are found in a similar genomic region as B. mori ser5 and S. ricini unigene1738, which are both placed in Group 2 based on protein-level similarities (Fig. 2). Similarly, the placement of B. mori ser3 remains unclear based on protein alignment (Fig. 2), although this gene is located in close proximity to B. mori ser2 and ser4 (Fig. 1c). Increased sampling of sericins across species is required to better understand the relationships between the four sericin groups described in the current study. Of the four sericin groups identified here, the Sericin1 Group is the only one that is shared with high bootstrap support between A. luna and B. mori, while the others appear to exhibit no or minimal overlap between Saturniidae and Bombycidae (Figs. 1 and 2). As silk production in B. mori has been the subject of an incredibly large number of studies on gene expression, proteomics, and transcriptomics (eg Dong et al. 2013; Zhang et al. 2015; Peng et al. 2019; Guo et al. 2023; Masuoka et al. 2024), we consider it unlikely that the B. mori genome contains additional unidentified, functional sericins (but see Wu et al. (2024) for a recently identified sericin-like gene). Similarly, we were unable to identify any A. luna sericins closely related to B. mori ser2-5 or S. ricini unigene1738. Our data thus suggest that the sericins in B. mori and A. luna exhibited distinct evolutionary trajectories, with an expansion of Group 2 in the lineage of B. mori and expansions of Groups 4 and 3 in Saturniidae.
Sericin Groups 3 and 4 are represented in A. luna by three and four sericins, respectively. While serA of Group 4 is isolated on ptg000025l—the A. luna ortholog to B. mori Chr8—, serB-D of Group 4 and serE-G of Group 3 are located in two gene clusters within relative genomic proximity of each other on ptg000081l—orthologous to B. mori chr11 (Fig. 1). The close genomic proximity and high sequence similarity within each of these clusters suggests that they both reflect tandem duplications (Figs. 1 and 2). In particular, serE-G are nearly identical based on overall gene structure (Fig. 1b), sequence similarity (Fig. 2 and Table S2), gene expression or rather the absence thereof (Fig. 3), repeat motifs (Fig. 4a), and amino acid content (Fig. 4b), and only clearly differ in their repeat number (Table 1 and Fig. 1b) and the sequence of their introns (Fig. S3). Near-identical gene copies are often thought to represent very recent duplications and are expected to either diverge or disappear (Lynch and Conery 2000). They often experience high levels of positive selection and undergo rapid functional divergence, either at the sequence level (Lynch and Conery 2000; Qiao et al. 2019) or at the expression level (Huerta-Cepas et al. 2011; Brasó-Vives et al. 2022; Cai and Des Marais 2024). In rare instances, sequence similarities are retained when increased gene dosage is beneficial to the organism (Perry et al. 2007; Sackton et al. 2007; Hahn 2009) or through hypofunctionalization (Birchler and Yang 2022; Brasó-Vives et al. 2022), a process where the expression of each copy decreases in order to retain final expression levels at the preduplication level. Alternatively, gene conversion, a process where homologous recombination results in the unidirectional transfer of genetic material from a donor gene to an acceptor gene, can also result in near-identical gene copies (Chen et al. 2007). Our Group 3 sericins exhibit a high level of divergence among introns and to a lesser extent the nonrepetitive C-terminus, when comparing serE with serF and serG (Fig. S3). This suggests that their repeat sequences are kept similar through selective pressure or gene conversion (Fig. 4a). The Group 3 sericins in other Saturniidae are similarly organized in a single gene cluster (eg A. yamamai Src2-5, S. ricini srp4,5; Fig. 1c), but they diverged more than in A. luna (Fig. 2). Although the lineage-specific expansions of Group 3 sericins could contribute to increased dosage, the higher diversity of Group 3 sericins in other saturniid lineages leads us to hypothesize that the high similarities among A. luna serE-G reflects their recent origin (serF, serG), gene conversion (serE), or a combination of both and that they will eventually functionally diverge.
The other A. luna sericin gene cluster (Group 4 sericins serB-D) is more divergent at the protein level, suggesting it is the result of an older gene duplication. Surprisingly, A. luna serC seems to be more closely related to A. luna serA, located on a different chromosome, than to the two sericins in its own gene cluster (serB, D) (Figs. 1 and 2). SerA might represent a relatively recent across-chromosome duplication of serC. Orthologs of both serA and serC are also present in A. selene but seem to be absent in other Saturniidae, suggesting such duplication would have happened in a common ancestor of A. luna and A. selene (Fig. 2). Actias luna serA is only the second confirmed case of a sericin found by itself on a chromosome, while most sericins are found within the same chromosome orthologous to B. mori Chr11 (Žurovec et al. 2016; Guo et al. 2022; Wu et al. 2022). The only other sericin located outside this chromosome is sericin P150 (Kludkiewicz et al. 2019; Wu et al. 2024), which is unrelated to serA because it differs in its chromosomal location (Fig. S1) and in the presence of a conserved CXCXCX motif in its N-terminus that is absent in A. luna serA (Kludkiewicz et al. 2019; Wu et al. 2024).
The evolution of sericins might be driven by two main factors. First, the high repeat content of sericins and their close genomic proximity (Žurovec et al. 2016; Dong et al. 2019; Wu et al. 2022) allows for increased gene duplication rates, generating redundant gene copies on which evolution has free play. Notably, recent expansions of tandemly duplicated genes have repeatedly been shown to play an important role during species differentiation and adaptation (Brown et al. 1998; Newcomb et al. 2005; Perry et al. 2007; Jugulam et al. 2014; Clifton et al. 2017; Delihas 2020). Second, evolutionary rates can be further accelerated by the high intragenic repeat content of sericins. Intragenic, repeat-mediated duplications can alter gene structure and sequence (Cheng and Chen 1999; Babcock et al. 2003; Delihas 2011; King 2024). Additionally, intragenic nonreciprocal recombination (gene conversion) or unequal crossing over among intragenic repeats can lead to fast homogenization of the repeat region and can quickly spread new mutations across a whole gene (Hayashi and Lewis 2000; Garb et al. 2007). The formation of gene clusters and the resulting lineage-specific expansions and deletions of sericins appear to be a hallmark of sericin evolution, as this phenomenon has also been reported in the lepidopteran family Pyralidae (Žurovec et al. 2016).
Luna Moth Sericins Exhibit Functional Specialization
Gene duplications can facilitate neofunctionalization or subfunctionalization of redundant gene copies, as duplicated genes are free to accumulate mutations without disrupting the ancestral gene function (Birchler and Yang 2022; Kuzmin et al. 2022). Although ancestral sericin expression patterns are unknown, the essential role of sericins (Takasu et al. 2017) suggests that it was likely broadly expressed across development. The distinct expression patterns we observed among A. luna sericins support the hypothesis that they have undergone subfunctionalization, with different paralogs exhibiting similar functions at different life stages.
The four A. luna Group 4 sericins (serA-D; Fig. 2) are expressed in the first and/or fourth instar but exhibit low to undetectable levels in the final instar (Fig. 3), and we thus refer to them as larval sericins (Fig. 5). The remaining A. luna sericins (ser1, and Group 3 sericins serE, serF, and serG; Fig. 2) exhibit overall low expression levels in our study, but seem to reach their highest expression levels in a last instar individual that also exhibited increased levels of FibH expression (Fig. 3). Although larval behavioral data were not recorded at the time of dissection, our results indicate that this caterpillar was likely actively spinning silk for cocoon creation at the time of dissection, supporting the hypothesis that A. luna ser1 and the A. luna Group 3 sericins (serE-G) function as cocoon-associated sericins (Fig. 5). This interpretation is consistent with evidence from related species, as the orthologous proteins in A. yamamai (Srn1-5), H. cecropia (src1, src2), and B. mori (ser1) (Figs. 1 and 2) are all known to be major components of cocoon silk (Takasu et al. 2002; Žurovec et al. 2016; Peng et al. 2019; Rouhová et al. 2024; Wu et al. 2024). Notably, we observed pronounced differences in amino acid composition between putative cocoon sericins in A. luna (ser1 and Group 4 sericins serE-G) and larval sericins (Group 3; serA-D), with larval sericins exhibiting higher proportions of charged residues and lower tyrosine content. Further support for life stage-specific differentiation (Fig. 5) comes from compositional analyses of A. luna silk fibers, which show marked differences in the sericin-rich outer coating between larval and cocoon silks, while the inner fibroin core remains largely conserved across life stages (Eccles et al. 2025).
Proposed hypothesis on A. luna silk sericin composition across different life stages. This scenario is based on our current findings and previously published work (see “Luna Moth Sericins Exhibit Functional Specialization”).
These observations are reminiscent of the well-studied B. mori, where differential expression of sericins was shown to drive functional divergence between larval and cocoon silks, with each being adapted to the caterpillar's ecological needs (Kludkiewicz et al. 2009; Takasu et al. 2010; Guo et al. 2022, 2025). Just like in A. luna, B. mori larval sericins (B. mori ser2, ser4, ser5, and serP150) exhibit higher levels of charged amino acids (respectively, >30% and <20% of total amino acids) than cocoon sericins (B. mori ser1 and ser3) (Takasu et al. 2002, 2007, 2010; Kludkiewicz et al. 2009; Dong et al. 2013, 2019; Zhang et al. 2015; Peng et al. 2019; Guo et al. 2022, 2023; Wu et al. 2024) and the cocoon sericin B. mori ser1 has similarly elevated tyrosine levels as A. luna cocoon sericins. Additionally, B. mori larval sericins have lower serine levels than cocoon sericins (respectively, <16% and >34%) (Takasu et al. 2010; Dong et al. 2019; Guo et al. 2022), a trend not observed in A. luna. The highly charged, low-serine larval sericins give B. mori silk a high adhesion to their host plant and substrate (Kludkiewicz et al. 2009; Takasu et al. 2010; Dong et al. 2019; Guo et al. 2022) and high rigidity and strength due to an increased β-sheet proportion in early life stages (Peng et al. 2019; Guo et al. 2022). The function of tyrosine residues in cocoon sericins is currently unknown, but the strong hydrogen bonds formed by tyrosine residues in B. mori silk fibroin aid in self-assembly (Partlow et al. 2016) and might similarly assist sericin assembly or interactions with other proteins.
This suggests that A. luna sericins could similarly drive the functions of the silk fiber outer coating, even though serine levels in A. luna sericins are consistently high (>20%) without differences between larval and cocoon sericins as in B. mori, and the differences in charged residues are not as extreme. We hypothesize that the compositional differences between both species represent differences in their respective ecology, for instance in cocoon structure (Chen et al. 2012b). The high serine content in A. luna sericins, observed across Saturniidae (Žurovec et al. 2016; Rouhová et al. 2024), is at odds with other lepidopteran species, where serine content can drop as low as 12% (Takasu et al. 2010; Dong et al. 2019; Guo et al. 2022; Wu et al. 2022). Additionally, A. luna ser1, and to a lesser extent serB, are unique among A. luna sericins because they exhibit significant proline levels. In spidroins, the major silk proteins of spiders, high proline levels are linked to increased silk elasticity (Savage and Gosline 2008), suggesting the high proline levels in A. luna sericin 1 might similarly increase silk elasticity. Life stage-specific sericin expression has been observed in a variety of lepidopteran taxa (Takasu et al. 2002; Kludkiewicz et al. 2009; Žurovec et al. 2013; Peng et al. 2019; Masuoka et al. 2024; Wu et al. 2024), and similar compositional trends might thus be found in sericins across Lepidoptera.
In addition to the life stage-specific regulation of sericin genes, they are also spatially regulated. Of the two main cocoon sericins in B. mori, ser1 is expressed exclusively in the posterior end of the middle silk gland and the encoded protein is deposited as an inner sericin layer of B. mori cocoon silk, while ser3 is expressed throughout the middle silk gland and its encoded protein forms an outer sericin layer (Takasu et al. 2010; Dong et al. 2013; Guo et al. 2023). Ser1 appears to be essential for successful silk assembly (Takasu et al. 2017) and its sequence and expression pattern are conserved across Lepidoptera (Takasu et al. 2007; Kludkiewicz et al. 2019; Rouhová et al. 2021; Guo et al. 2025) including Saturniidae. Indeed, ser1 orthologs in two saturniid taxa—A. yamamai Srn1 and H. cecropia Src1 (Figs. 1, 2, and S2)—are only expressed in the posterior part of the middle silk gland (Žurovec et al. 2016; Rouhová et al. 2024). Even though the evolutionary relationship between B. mori ser3 and Group 3 sericins is not entirely clear (Figs. 1 and 2), Group 3 sericins of A. yamamai and H. cecropia (respectively Srn2-5 and Src2; Figs. 1 and 2) are expressed throughout the middle silk gland of prepupal caterpillars (Žurovec et al. 2016; Rouhová et al. 2024) in a similar fashion as B. mori ser3 (Takasu et al. 2010; Dong et al. 2013; Guo et al. 2023). Thus, the two-layered structure of the sericin outer coating might be retained in Saturniidae (Fig. 5). This hypothesis is further supported by the elevated glutamine levels found in both A. luna Group 3 sericins and B. mori ser3, which might play a role in increasing sericin adhesion (Dong et al. 2019). While B. mori larval silks exhibit a similar multilayered sericin organization (Dong et al. 2019), silk gland region-specific data for larval sericins are currently lacking for Saturniidae and it is thus unclear whether larval silks in Saturniidae exhibit a similar multilayered sericin composition.
The similar patterns of sericin temporal expression, spatial expression, and composition between Saturniidae and B. mori are particularly surprising considering that a large number of their respective sericins represent separate gene expansions (Figs. 1 and 2). In B. mori, Group 2 sericins are larval sericins with similar expression patterns and amino acid compositions. In A. luna and other Saturniidae, Group 4 sericins have convergently specialized into larval sericins, even though they are located in a different genomic region. Although B. mori ser3 and saturniid Group 3 sericins share the same expression pattern, their clustering based on protein similarity (Fig. 2) seems to be at odds with their genomic location (Fig. 1). Further studies are needed to assess whether these genes are orthologs or whether convergent evolution of more distantly related sericins led to similar protein sequences. Sericin 1, the only sericin for which we found no duplications, seems to be conserved across the included lineages, and likely preserved its function. Further investigations outside Bombycidae and Saturniidae are needed to establish which represents the ancestral situation, and whether other lineages of Lepidoptera convergently evolved multiple sericins specialized for particular life stages.
Concluding Remarks
Our study advances our understanding of how highly repetitive silk genes evolve within a lineage and demonstrates that sericin diversification represents a major, yet underappreciated, component of silk evolution in Lepidoptera. By providing a detailed molecular characterization of sericin genes in the Luna moth (A. luna), we reveal a diverse repertoire of eight sericin paralogs that differ markedly in sequence architecture, repeat composition, and developmental expression. These findings highlight that sericin evolution parallels—and potentially complements—the well-studied diversification of fibroin genes, extending current models of silk evolution beyond the fibroin core to include the biologically and mechanically important outer silk coating.
Historically, most genotype–phenotype analyses of lepidopteran silk have focused on the fibroin heavy chain (fibH), the primary structural component of the silk fiber core. In contrast, sericins constitute the bulk of the outer coating and play critical roles in modulating silk viscosity, adhesion, self-assembly, and overall fiber performance. Our results demonstrate that duplications of sericin genes, characterized by extreme repeat content, frequent clustering, and pronounced variation in repeat number and amino acid composition, enable dynamic shifts in silk composition across life stages and among species. Comparative analyses across Saturniidae and Bombycidae further indicate that sericin gene families have undergone extensive lineage-specific expansions and losses, facilitating both subfunctionalization and convergent evolutionary trajectories.
Together, these findings suggest that the rapid evolution of the sericin layer may provide a flexible mechanism for fine-tuning silk properties in response to ecological, developmental, and functional demands. While fibroin proteins themselves exhibit notable evolutionary variability, including allelic divergence within individuals, the repeated duplication and differential regulation of sericin genes may allow the silk coating to evolve even more rapidly. As such, sericins represent a compelling model for studying the evolutionary consequences of gene duplication in highly repetitive protein-coding genes and underscore the importance of considering the full molecular architecture of silk when investigating the evolution of this key adaptive trait.
Methods
Caterpillar Rearing
All A. luna caterpillars descended from wild-caught females collected in the southeastern United States in 2021. Eggs were collected in a paper bag and transferred to a US Department of Agriculture (USDA) containment lab at the University of Florida's McGuire Center for Lepidoptera and Biodiversity. After hatching, caterpillars were kept together in clear plastic cups with lids (16 oz.) and were fed with fresh American sweetgum (Liquidambar styraciflua).
Short-Read RNA Sequencing and Read Alignment
Samples for Illumina short-read RNA sequencing were part of a larger study described previously (Table S4) (Markee et al. 2024). Reads from 28 samples, originating in a variety of life stages and body parts, were trimmed based on their quality with trimmomatic v0.39 (removing bases with a score under 25 from either end, as well as using a sliding window of 3 bases with a minimum average quality of 20) (Bolger et al. 2014). Additionally, trimmomatic was used to remove adapter sequences, using the “Truseq3-PE-2.fa” file provided by trimmomatic v0.39. Pairs for which both reads had a final length of over 50 were retained for further analysis. Sequencing errors were corrected using Rcorrector v 1.0.4 (Song and Florea 2015) and unfixable reads were removed using a publicly available python script (rm_rcorrector_unfixable.sh, https://github.com/harvardinformatics/TranscriptomeAssemblyTools). Filtered reads were aligned to the genome of A. luna (GenBank accession number GCA_039707435.1) (Markee et al. 2024), using Subread v2.0.6 (Liao et al. 2013), allowing multimapping and using a genome feature file (gff) originally generated by Markee et al. (2024), that was adapted for use by subread and to include the sericins included in the current study. The number of reads mapping to each gene was counted with featureCounts (Liao et al. 2014), assigning fractional counts to multimapping reads. Raw read counts were subsequently imported into R v4.2.2 (R Core Team 2025) and normalized with the counts per million method using edgeR v3.40.2 (Robinson et al. 2010) for plotting purposes. For differential expression analysis in edgeR, raw reads were normalized using the TMM method (Robinson et al. 2010) and a quasi-likelihood F (QLF) test was used to assess differences between selected groups (whole-body first instar, abdomen fourth instar, abdomen fifth instar; n = 3 for each group), using the Benjamini–Hochberg method to correct for multiple testing.
Long-Read Silk Gland Transcriptome
Silk glands were carefully extracted from a single individual at each representative life stage (instars 1-5—L1-5) by decapitating caterpillars live over ice while submerged in Qiagen RNAprotect Tissue Reagent (Qiagen, Cat #76104). Glands were immediately flash frozen in liquid nitrogen in 1.8 mL microcentrifuge tubes and stored at −80 °C. RNA extractions, library preparation, sequencing and initial subread processing were performed at the University of Florida's Interdisciplinary Center for Biotechnology Research (RRID:SCR_019152). RNA was isolated using a Qiagen RNeasy micro extraction kit (Qiagen, Cat #74004) following the manufacturer protocol and assessed for concentration and quality using a Qubit fluorometer and the Agilent 2100 Bioanalyzer (Agilent Technologies, Inc.). SMRT Bell IsoSeq libraries were prepared for PacBio SEQUEL IIe according to the manufacturer protocol (Pacific Biosciences, Cat #PN 101-763-800) with few modifications. RNA preps were cleaned and concentrated using the ZYMO Research RNA Clean and Concentrator kit (ZYMO Research, Cat #R1015). Total RNA from each larval instar silk gland was diluted to ∼100 ng, after which full-length cDNA was synthesized with amplification as described in the protocol for Low Input RNA:cDNA Synthesis and Amplification (New England Biolabs, Cat #E6421). Further sample prep was performed with PacBio SMRTbell Express Template Prep Kit 2.0 (PacBio, PN 100-938-900) and Barcoded Overhang Adapter Kits A&B (Pacific Biosciences, Cat #PN 100-628-400 and #PN 100-628-500), with 1 µg of amplified cDNA per sample as input. The final library contained one barcoded cDNA sample per larval instar, for a total of five samples. The barcoded library was sequenced using the PacBio Sequel IIe platform. Consensus sequences for subreads were generated using ccs. All further subread processing was performed with the IsoSeq 3 pipeline (v4.0.0, https://github.com/PacificBiosciences/IsoSeq). Lima v2.7.1 was used to filter reads and remove primers and barcodes for each sample separately. Due to the low quantity of reads passing initial quality control, reads of different life stages were pooled for further analysis (Tables S3 and S5). PolyA tails and artificial concatemers were removed using the IsoSeq refine-function, after which reads were clustered using IsoSeq cluster. Clustered reads were mapped to the A. luna genome (GenBank accession number GCA_039707435.1) using pbmm2 v1.13.1, a SMRT C++ wrapper for minimap2 (Li 2018). Mapped reads were clustered into unique isoforms using IsoSeq collapse and subsequently classified and filtered using pigeon v1.2.0.
Sericin Identification
A sericin dataset was generated from previously identified sericins from A. selene (Dong et al. 2015), A. assamensis (Dong et al. 2015), A. pernyi (Dong et al. 2015), A. yamamai (Žurovec et al. 2016), B. mori (Takasu et al. 2007; Dong et al. 2019; Guo et al. 2022), H. cecropia (Rouhová et al. 2024), R. newara (Dong et al. 2015), and S. ricini (S. cynthia ricini) (Dong et al. 2015; Tsubota et al. 2016). Each of these sequences were downloaded from the NCBI nucleotide database (Sayers et al. 2025) or from the original article when unavailable there. To increase the reliability of this sericin dataset, each putative sericin was manually inspected and compared with the NCBI nr database and sequences exhibiting high similarity with nonsericin sequences, such as titin, were excluded. Whenever a genome was available, duplicates were removed by only retaining the longest transcript matching a particular genomic region. In the case of S. ricini srp4, the S. ricini genome (GCA_039707435.1) was used to resolve ambiguous bases and extend the C-terminus. Using the final sericin dataset (Table S1), putative sericins were discovered in the A. luna genome (GCA_039707435.1) using the command-line version of the Basic Local Alignment Search Tool (BLAST v2.14.1) (Altschul et al. 1990; Camacho et al. 2009) using both blastn and blastp with an Expect value threshold of 1e−50. Additionally, nucleotide sequences for saturniid sericins were aligned using muscle v5.1 (Edgar 2022) and matching patterns in the A. luna genome were identified with nhmmer (hmmer v3.4) (Wheeler and Eddy 2013). For each hit that contained a high number of serine-containing repeats, the surrounding 5,000 bp were extracted and imported into Geneious v 11.1.5 (https://www.geneious.com). Extracted genomic regions were manually annotated using the BRAKER3 automatic genomic annotation (Gabriel et al. 2024; Markee et al. 2024), the long-read silk gland-specific transcriptome (see “Long-read Silk Gland Transcriptome”), and whole-body short-read RNA sequencing (see “Short-read RNA Sequencing and Read Alignment”) to identify the number of exons and intron–exon boundaries. Exons were deemed alternative exons if they were not present in all long-read transcripts or if short reads bridging the two neighboring exons were found. Subsequently, genomic regions around each sericin were scanned for recent gene duplications by extracting 50,000 additional base pairs on either side of each gene and looking for annotated genes that bear similarity with sericins. The presence of a signal peptide was confirmed using the web service of SignalP 6.0 server (Teufel et al. 2022). Identification was confirmed with an online BLAST search to the NCBI nr database (Camacho et al. 2009; Sayers et al. 2025).
Chromosome Synteny Maps
Genomic locations of sericins in the genomes of A. luna, A. yamamai, B. mori, and S. ricini (NCBI accession numbers, respectively: GCA_039707435.1; GCA_036509395.1; GCF_030269925.1; GCA_014132275.2) were compared using ChromSyn (Edwards et al. 2022). A set of 5286 single-copy highly conserved ortholog sequences were identified using BUSCO v5.3.0 (Simão et al. 2015) with a lepidoptera-specific dataset. Because one of the putative sericins (serA) was located in a region without BUSCO genes, two B. mori genes (LOC110385059 and LOC101746754) and their orthologs in the three other species were added to the BUSCO dataset to extend synteny in this region. Telomere locations were identified and contig lengths were extracted for each genome with telociraptor v0.11.0 (https://github.com/slimsuite/telociraptor). Synteny maps were generated with ChromSyn in R v4.4 (Edwards et al. 2022; R Core Team 2025) with “minregion” set to 10 to show all synteny blocks larger than 10 kb.
Gene Phylogenetic Network
Thirty-four sericin sequences for B. mori and representative saturniid taxa (Table S1; see the “Sericin Identification” section) were aligned using MAFFT v7.520 using local pairwise alignment with 1,000 iterations (Katoh and Standley 2013). The alignment was manually inspected and minor corrections were made where necessary (alignment is available on FigShare, see the Data Availability section). To select an optimal amino acid substitution model and gamma value, we ran the ModelFinder function of IQ-TREE v3.0.1 (Nguyen et al. 2015; Kalyaanamoorthy et al. 2017) on the corrected alignment. All further analyses, including generating and visualizing the phylogenetic network, were run in the Splitstree app 6.0.0 (Huson and Bryant 2024). A pairwise distance matrix was generated using the protein ML Distance method (Swofford et al. 2009) with the JTT model and Gamma set to 4.733. Gene phylogenetic networks were generated using the Neighbor Net method (Bryant and Moulton 2004; Bryant and Huson 2023) with default options to obtain 94 splits. Bootstrap support was assessed using 1,000 bootstrap replicates with the Bootstrap Splits methods (Felsenstein 1985). Only the 83 splits with a bootstrap support over 10 were shown.
Repeat Sequence Motifs
Sericin sequences were imported in Geneious v 11.1.5 (https://www.geneious.com). Repeats were manually extracted and aligned using muscle v3.8.425. The resulting alignment was used to generate a sequence logo in Geneious, showing the consensus sequence as well as the relative amino acid frequency and information content (in bits) at every position (Schneider and Stephens 1990).
Supplementary Material
evag036_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aikman EL, Eccles LE, Stoppel WL. Native silk fibers: protein sequence and structure influences on thermal and mechanical properties. Biomacromolecules. 2025:26:2043–2059. 10.1021/acs.biomac.4c 01781.40052735 PMC 12155892 · doi ↗ · pubmed ↗
- 2Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990:215:403–410. 10.1016/S 0022-2836(05)80360-2.2231712 · doi ↗ · pubmed ↗
- 3Babcock M, et al Shuffling of genes within low-copy repeats on 22q 11 (LCR 22) by Alu-mediated recombination events during evolution. Genome Res. 2003:13:2519–2532. 10.1101/gr.1549503.14656960 PMC 403794 · doi ↗ · pubmed ↗
- 4Birchler JA, Yang H. The multiple fates of gene duplications: deletion, hypofunctionalization, subfunctionalization, neofunctionalization, dosage balance constraints, and neutral variation. Plant Cell. 2022:34:2466–2474. 10.1093/plcell/koac 076.35253876 PMC 9252495 · doi ↗ · pubmed ↗
- 5Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014:30:2114–2120. 10.1093/bioinformatics/btu 170.24695404 PMC 4103590 · doi ↗ · pubmed ↗
- 6Brasó-Vives M, et al Parallel evolution of amphioxus and vertebrate small-scale gene duplications. Genome Biol. 2022:23:243. 10.1186/s 13059-022-02808-6.36401278 PMC 9673378 · doi ↗ · pubmed ↗
- 7Brown CJ, Todd KM, Rosenzweig RF. Multiple duplications of yeast hexose transport genes in response to selection in a glucose-limited environment. Mol Biol Evol. 1998:15:931–942. 10.1093/oxfordjournals.molbev.a 026009.9718721 · doi ↗ · pubmed ↗
- 8Bryant D, Huson DH. Neighbor Net: improved algorithms and implementation. Front Bioinforma. 2023:3:1178600. 10.3389/fbinf.2023.1178600.PMC 1054819637799982 · doi ↗ · pubmed ↗
