Imagin: An Integrase-Like Gene Conserved Across Malacostracan Crustaceans Derived From a Ginger1 DNA Transposon
Liyuan Hao, Satoshi Kawato, Reiko Nozaki, Miho Furukawa, Hidehiro Kondo, Ikuo Hirono

TL;DR
This paper describes a gene in crustaceans that evolved from a transposable element and now plays a role in reproduction.
Contribution
Imagin is a newly identified gene family derived from a transposon in malacostracan crustaceans.
Findings
Imagin is a single-copy gene in Penaeus japonicus, located within the MMUT gene.
Imagin orthologs have lost the catalytic DDE triad, indicating a noncatalytic function.
Imagin shows divergent expression in reproductive tissues across crustacean lineages.
Abstract
Domestication of transposable elements has been extensively documented in vertebrates, but few examples have been reported in nonmodel organisms, particularly crustaceans. Here, we present Imagin (Integrase-like gene in MAlacostracans derived from GINger1), a gene family derived from a Ginger1 DNA transposon domesticated in the common ancestor of malacostracan crustaceans over 400 million years ago. We discovered Imagin in the kuruma shrimp Penaeus japonicus as a single-copy, multiexon gene residing within a conserved intron of the methylmalonyl-CoA mutase (MMUT) gene. Comprehensive phylogenetic and structural analyses demonstrate that while Imagin orthologs are under strong purifying selection and retain the conserved H2C2 zinc-finger domain and integrase core, they have ubiquitously lost the catalytic DDE triad essential for endonuclease activity. These structural features indicate…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6| Order | Suborder | Infraorder | Family | Species | Genome | Transcript | Protein | Reference |
|---|---|---|---|---|---|---|---|---|
| Decapoda | Dendrobranchiata | … | Penaeidae |
|
| |||
| … | … | … | … |
|
| |||
| … | … | … | … |
|
| |||
| … | … | … | … |
|
| |||
| … | … | … | … |
|
| |||
| … | … | … | … |
| – | GGTT01000830.1 | – |
|
| … | … | … | … |
| – | GGTU01001009.1 | – |
|
| … | … | … | … |
| – | GGTV01001257.1 | – |
|
| … | … | … | … |
| – | GEUA01008623.1 | – |
|
| … | Pleocyemata | Caridea | Palaemonidae |
|
| |||
| … | … | … | … |
|
| |||
| … | … | … | … |
| – | – |
| |
| … | … | … | … |
| – | GHDW01049199.1 | … |
|
| … | … | … | … |
| – | GHDQ01045443.1 | – |
|
| … | … | … | Pandalidae |
| JAVIII010019169.1 | – | – |
|
| … | … | … | Crangonidae |
| – | GKAF01001613.1 | – |
|
| … | … | … | Atyidae |
| – | – | PRJNA1037494 | |
| … | … | … | … |
| – | GHBI01031222.1 | – |
|
| … | … | … | … |
| – | – |
| |
| … | … | … | … |
| BDMR012730770.1 | – | – |
|
| … | … | Achelata | Palinuridae |
|
| |||
| … | … | … | … |
| JAUUEG010005451.1 | – | – |
|
| … | … | … | … |
| JAUUDZ010013417.1 | – | – |
|
| … | … | … | … |
| JAUUDY010602921.1 | – | – |
|
| … | … | … | … |
| JAUTXX010152419.1 | – | – |
|
| … | … | … | … |
| JAUUEF010021749.1 | – | – |
|
| … | … | … | … |
| – | GHUJ01013554.1 | – |
|
| … | … | … | … |
| – | GGHM01141313.1, GGHM01179921.1 | – |
|
| … | … | Astacidea | Homaridae |
|
| |||
| … | … | … | … |
| – | … |
| |
| … | … | … | Parastacidae |
|
| |||
| … | … | … | … |
| WNWK01008214.1 | – | – |
|
| … | … | … | Cambaridae |
|
| |||
| … | … | … | … |
| – | GJEC01212686.1 | – |
|
| … | … | … | … |
| – | GIVA01013313.1 | – |
|
| … | … | Anomura | Porcellanidae |
| – |
| ||
| … | … | … | … |
| – |
| ||
| … | … | … | Coenobitidae |
| – | – |
| |
| … | … | … | Lithodinae |
| – | – |
| |
| … | … | … | … |
| – | GHJC01074738.1 | – |
|
| … | … | Brachyura | Varunidae |
|
| |||
| … | … | … | … |
| GKPI01496440.1 | – | – | PRJNA1018655 |
| … | … | … | Oregoniidae |
| – | – | PRJNA602365 | |
| … | … | … | Cancridae |
| – | – |
| |
| … | … | … | … |
| – | – | PRJNA902360 | |
| … | … | … | Carcinidae |
| – | GFYV01193636.1 | – |
|
| … | … | … | Portunidae |
| JAOPJN010000262.1 | – | – | PRJNA773940 |
| … | … | … | … |
|
| |||
| … | … | … | … |
|
| |||
| … | … | … | … |
| – | GKPJ01278730.1 | – | PRJNA1018655 |
| Euphausiacea | … | … | Euphausiidae |
| JAPMSX010984927.1 | – | – | PRJNA867116 |
| … | … | … | … |
| – | – |
| |
| Stomatopoda | … | … | Squillidae |
|
| |||
| Isopoda | Oniscidea | … | Armadillidiidae |
| – | – |
| |
| … | … | … | … |
| – | – |
| |
| … | … | … | Trichoniscidae |
| – | GKUF01008836.1 | – |
|
| … | … | … | … |
| – | GKTX01069465.1 | – |
|
| … | Cymothoida | … | Cirolanidae |
| – | – |
| |
| Amphipoda | Talitrida | … | Hyalellidae |
|
| |||
| … | … | … | Hyalidae |
| – | GFVL01019626.1 | – |
|
| … | Talitroidea | … | Talitridae |
| – | GDUJ01036696.1 | – |
|
| … | … | … | … |
| – |
| ||
| … | … | … | … |
| – | – |
| |
| … | Gammaridea | … | Hirondelleidae |
| – | GEZX01077614.1 | – |
|
| Taxon | Sequences | Codons | Total Tree length (subs/site) | Log(L) | AIC-c | Estimated parameters | d |
|---|---|---|---|---|---|---|---|
|
| 9 | 586 | 0.185 | −3880.06 | 7820.48 | 30 | 0.0574 (0.0420 to 0.0761) |
|
| 4 | 587 | 0.08 | −3054.74 | 6149.84 | 20 | 0.1332 (0.0960 to 0.1789) |
| Astacidea | 5 | 611 | 0.194 | −3615.76 | 7275.85 | 22 | 0.1245 (0.1006 to 0.1520) |
|
| 7 | 611 | 0.057 | −2975.32 | 6000.94 | 25 | 0.0603 (0.0354 to 0.0949) |
| Portunidae | 4 | 524 | 0.174 | −3252.48 | 6545.37 | 20 | 0.1255 (0.0972 to 0.1585) |
| Primer | Sequence (5′—to 3′) |
|---|---|
| PjEF1a_F | ATGGTTGTCAACTTTGCCCC |
| PjEF1a_R | TTGACCTCCTTGATCACACC |
| PjEF1a_qPCR_F | ATTGCCACACCGCTCACA |
| PjEF1a_qPCR_R | TCGATCTTGGTCAGCAGTTCA |
| PjImagin_F | GCAGTGTGGACTAGATGTTC |
| PjImagin_R | CTCCTGCTCATCAGAGTAAG |
| PjImagin_qPCR_F | TCATCCACCCCAACCAACTC |
| PjImagin_qPCR_R | CAGATTGGAGGTTTGAGCCG |
| PjImagin_5RACE | ATTCCTCCTTCCGACTCTTCCTGAGATGTG |
| PjImagin_5RACE2 | TATCCTTGACTCCCTTCACG |
| PjImagin_3RACE | CCAACTCTTCCCAGCATCAATCCAGTCAAG |
| PjImagin_3RACE2 | CCATTCAATCCAACAGCTCG |
| PjImagin_3RACE3 | ATCTGCTCTCAGTGATGATG |
| NdeI_PjImagin_antigen_F | catatgGGAACACAACATCCAGACAG |
| PjImagin_antigen_6His_R | tcaatgatgatgatgatgatgTGAGTCTCCGTACG |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInvertebrate Immune Response Mechanisms · Crustacean biology and ecology · Protist diversity and phylogeny
Introduction
Transposable elements (TEs) are genetic elements that move within host genomes (Bourque et al. 2018). Although TEs are often regarded as “parasites” threatening genome integrity, they sometimes provide raw materials for molecular novelties in the host (Brandt et al. 2005). TE-originated host genes represent examples of molecular exaptation, or co-option, whereby proteins acquire biological roles distinct from their original functions following domestication by a new host.
TE domestication is especially well documented in vertebrates. One of the best-known examples is syncytin, a cell–cell fusion protein that plays a crucial role in the development of placenta in humans and closely related primates (Blond et al. 1999; Mi et al. 2000). Syncytin is derived from the env gene of the human endogenous retrovirus HERV-W. Another retrotransposon-derived gene, Peg10, is more widely distributed among mammals and is also essential for placental development (Ono et al. 2006, 2001). RAG1 and RAG2 recombinases, essential for V(D)J recombination, are derived from a Transib DNA transposon rather than a retrotransposon. They generate immune diversity by rearranging gene segments in immunoglobulin and T-cell receptor genes (Gellert 2002; Kapitonov and Jurka 2005; Zhang et al. 2019b). Some domesticated elements, like Lyosin (Kitao et al. 2025) and NYNRIN (NYN domain and Retroviral Integrase Containing; also known as CGIN1), are fused to cellular genes.
During the annotation process of a draft genome assembly of the kuruma shrimp Penaeus japonicus (Kawato et al. 2021), we discovered a multiexon gene encoding an integrase-like protein, which intrigued us because it possesses introns despite its apparently TE origin. Preliminary searches against publicly available decapod genomes and transcriptomes indicated that this gene is highly conserved among decapods, suggesting ancient origins and potential functional importance. The combination of integrase-like domains and stable conservation indicated possible molecular domestication, motivating us to investigate its evolutionary history and functional role.
Results
Discovery and Structure of an Integrase-like Gene (PjImagin)
During the genome annotation of the kuruma shrimp P. japonicus, we identified a multiexon gene encoding a protein exhibiting similarity to transposases. We named this gene PjImagin (P. japonicus Imagin; LOC122254663). The full-length transcript of PjImagin (DDBJ Accession no. LC877020.1) was 2,623 nucleotides (nt) in length, encoding a 585-amino acid (aa) predicted protein with an estimated molecular mass of 64.9 kDa and an isoelectric point (pI) of 7.60. The gene comprises 3 exons, with the first exon consisting entirely of the 5′-untranslated region (UTR), and lies between exons 14 and 15 of methylmalonyl-CoA mutase, mitochondrial-like gene (MMUT; LOC122254656) (Fig. 1b; Table S1).
Imagin, a multiexon integrase-like gene in malacostracan crustaceans. a) Structure of the PjImagin gene. The transcript is shown in light blue and the CDS in white. The YPYY motif is colored yellow, the H2C2 zinc finger is colored sky blue, the defunct DDE catalytic core is colored in pink, and the disordered region is shaded in gray. Gray vertical lines denote the positions of key residues denoted in panels c and d, with the last 2 lines being shaded to indicate the corresponding positions for the lost D and E residues (panel e). b) The genomic context of PjImagin and orthologs. PjImagin is flanked by exons 14 and 15 of methylmalonyl-CoA mutase, mitochondrial-like (MMUT) gene, which runs on the opposite strand. Note that the Imagin gene of the swimming crab (Portunus trituberculatus) lies on an unrelated position. MMUT CDS is colored green, and MMUT exons corresponding to exons 14 and 15 of P. japonicus MMUT are connected with light green ribbons. CDS of other genes are shaded gray. c) Domain architectures of malacostracan Imagin proteins. The coloring scheme follows that of panel (a). d) Conservation of the YPYY motif and the H2C2 (HHCC) zinc-finger domain in integrase proteins. Species names are listed on the left, with protein accession numbers provided on the right. Entries Ginger1-1_HM, Ginger1-2_HM, and Ginger1-3_HM were adopted from Bao et al. (2010). The host species for these 3 entries was described as Hydra magnipapillata in Bao et al. (2010), but we opted for the species name H. vulgaris, deferring to NCBI Taxonomy (NCBI:txid6087). e) Loss of the DDE catalytic residues in crustacean Imagin proteins. Species names are listed on the left, with protein accession numbers provided on the right. Note that some residues may be missing from this alignment because regions containing long gaps were automatically suppressed by DALI. Each residue is shaded by background color according to the secondary structure assignments by DSSP: light green: loop; dark green: helix; blue: strand. Conserved aspartic acid (d) and glutamic acid (e) residues comprising the DDE motif are highlighted in black. PFV, prototype foamy virus; RSV, Rous sarcoma virus; HTLV-1, human T-lymphotropic virus 1; SIV, simian immunodeficiency virus; Ty3, Ty3 retrotransposon.
A domain search using HMMER3 revealed a H2C2 zinc-finger domain (PF17921) and a transposase-like catalytic core (IPR001584), both of which are characteristic of DDE transposases and integrases seen in retroelements and certain DNA transposons (Esposito and Craigie 1999) (Fig. 1a). However, structural alignment with representative integrases indicates that PjImagin and its orthologs in other crustaceans lack the conserved aspartic acid (D) and glutamic acid (E) residues that constitute the canonical DDE catalytic triad (Fig. 1e; Supplementary Data S1) (Nesmelova and Hackett 2010). This suggests that the Imagin protein has lost its enzymatic activity, at least with respect to the canonical DDE-dependent mechanism. Upstream of the zinc-finger domain lies the “YPYY motif”, which is found in a subset of retroelements and DNA transposons (Bao et al. 2010).
PjImagin orthologs and their genomic contexts are highly conserved across decapod crustaceans (Table 1; Fig. S1; Supplementary Data S2). The dN/dS ratios of Imagin orthologs in decapods ranged from 0.05 to 0.13, indicating that Imagin orthologs are under purifying selection (Table 2; Supplementary Data S3; Table S10). Conservation of the Imagin coding region and UTRs, particularly in Penaeus spp., suggests that these noncoding elements have critical regulatory roles (Good 1995). The extensive conservation of regulatory regions may reflect species-specific requirements for gene expression control, such as sex-biased expression patterns (Ellegren and Parsch 2007).
Brachyuran Imagin orthologs are absent from the canonical location and reside in a different genomic region (Fig. 1b, Fig. S1), but their orthology is supported by the facts that: (i) they are single-copy genes (Table S2), and (ii) the phylogenetic tree drawn using the Imagin orthologs conforms to the species phylogeny of decapods (Fig. 2; Supplementary Data S4) (Wolfe et al. 2019), although brachyuran crabs formed an exceptionally long branch relative to the other taxa. This could be due to accelerated evolution of the Imagin gene in brachyuran crabs, as suggested by the divergent genomic location of the Imagin gene in these species.
Phylogenetic analysis of decapod imagin proteins. A total of 463 sites were used for the maximum-likelihood phylogenetic analysis using IQ-TREE v3.0.1 (substitution model: Q.MAMMAL + F + I + G4). The slashed values beside the nodes indicate the support values for the ultrafast bootstrap test, followed by the SH-like approximate likelihood ratio test (1,000 trials each). The bar indicates amino acid substitutions per site. Euphausiacea and Stomatopoda were used as the outgroup. Imagin proteins from brachyuran crabs formed a long branch relative to the other taxa, suggesting accelerated evolution following the translocation event in this lineage.
The remarkable conservation in decapods prompted us to explore Imagin orthologs across malacostracan crustaceans. TBLASTN searches querying the PjImagin protein sequence against publicly available malacostracan genomes readily identified Imagin orthologs from krills (Euphausia superba and Meganyctiphanes norvegica; order Euphausiacea) (Unneberg et al. 2024) and the mantis crab (Oratosquilla oratoria; order Stomatopoda) (Table 1; Table S3) (Zhang et al. 2025a), with all lying between exons 14 and 15 of the MMUT gene (Fig. 1b; Fig. S1; Table S1). Krill and mantis crab Imagin proteins closely resemble decapod orthologs in terms of domain architecture and lengths.
Imagin orthologs were present in isopods and amphipods (peracarids) between exons 14 and 15 of the MMUT gene, but Imagin orthologs in these organisms were truncated relative to those of decapods (Fig. 1b and c; Fig. S1; Table S4 to S8; Supplementary Data S2 and S5). In amphipods, the YPYY motif and the H2C2 zinc finger were degenerated, and the DDE integrase domain has been completely lost, with the C-terminal substituted by intrinsically disordered region.
We could not examine the presence of Imagin orthologs in other malacostracan lineages, such as Leptostraca, Mysida, and Tanaidacea, but this was due to the lack of high-quality genome assemblies and does not exclude the possibility that Imagin orthologs exist in these lineages.
Collectively, these findings demonstrate that Imagin domestication in an intronic region between exons 14 and 15 of MMUT gene took place at least before the divergence of major malacostracan clades, which likely dates back to the Cambrian to Ordovician (490 to 440 million years ago) (Bernot et al. 2023).
Imagin Originated From a Ginger1 DNA Transposon: Naming and Etymology
To investigate the phylogenetic origins of Imagin, we built a maximum-likelihood phylogenetic tree of DDE integrases from LTR retrotransposons and retroviruses (Kojima 2019). The tree resolved major clades of retroviruses and transposons and placed Imagin orthologs within the branch made up of Ginger1 (“Gypsy INteGrasE Related 1”)-like elements, a family of multiexon DNA transposons phylogenetically related to retroelement integrases (Bao et al. 2010; Marín 2010) (Fig. 3; Supplementary Data S6; Table S9). This tree placed human GIN1 (Lloréns and Marín 2001) in a clade distinct from Imagin, confirming their independent evolutionary histories. The conservation of YPYY motif among Imagin orthologs (Fig. 1d) also aligns well with phylogenetic relationships.
Imagin is a domesticated Ginger1 DNA transposon. a) Maximum-likelihood phylogenetic tree of 121 DDE integrases and derivatives (136 sites; model: VT + R6). b) Subtree of a) showing the phylogenetic relationships of Ginger1-like elements. The bar beside the tree indicates amino acid substitutions per site. The slashed values beside the nodes indicate the support values for the ultrafast bootstrap test, followed by the SH-like approximate likelihood ratio test (1,000 trials each).
Based on all these findings, we formally named this gene family “Imagin” (Integrase in malacostracans originating from Ginger1). The omission of the final “e” from “imagine” alludes to the structural loss of the conserved glutamate (E) residue in the DDE catalytic triad described above. In hindsight, Imagin possessing introns was not surprising given that this gene originated from a multiexon DNA transposon. The Imagin genes have defunct DDE motifs and lack the inverted repeats, both of which are essential for mobility. The loss of these features was likely essential for the domestication event.
PjImagin Protein Accumulates in the Cytosol of Developing Oocytes
We used P. japonicus as a model to analyze the expression and functions of Imagin. qPCR analysis revealed PjImagin expression was markedly elevated in the ovary, suggesting a role in female reproductive development (Fig. 4).
Expression of PjImagin in different tissues. The relative expression levels of PjImagin in different tissues of kuruma shrimp (n = 3). Expression values are visualized using barplots overlaid with beeswarm plots to show both distribution and individual data points.
We next performed immunohistochemistry to examine the cellular localization of PjImagin protein in kuruma shrimp gonads. No signals were detected from the testis, suggesting that PjImagin protein is not expressed in the male gonad (Fig. 5a). In the immature ovary (BW: 10 g; Fig. 5b), brown positive signals were detected exclusively in the oocytes while oogonia were negative, suggesting that PjImagin accumulation takes place in female germline cells undergoing meiosis. Notably, positive signals were present in the cytosol and not in the nucleus of oocytes. PjImagin accumulation is further pronounced in more developed oocytes and mature ova (BW: 25 g; Fig. 5c and d). In mature ova, positive signals were detected in yolk and nucleoplasm, while the nucleoli or nuclear membrane remained negative (Fig. 5d).
PjImagin accumulates in developing oocytes. Immunohistochemical staining was performed on gonadal sections to examine the distribution of PjImagin. a) A testis section from a male shrimp showed no detectable DAB-positive signal, while ovarian sections from female shrimps showed clear localization of PjImagin. b) The ovary from a 10 g female shrimp was observed under 20× magnification, and c and d) the ovary from a 25 g female shrimp was observed under both 20× and 40× magnifications. Brown DAB staining indicates PjImagin localization. Cell stages with positive signals are labeled in red, while negative ones are labeled in black. All sections were counterstained with hematoxylin. Abbreviations: OG, oogonium; OC, oocyte; OV, mature ovum; YB, yolk body; N, nucleus; NE, nuclear envelope; Nuu, nucleolus; NP, nucleoplasm; SG, spermatogonium; SC, spermatocyte; ST, spermatid.
Contrasting Expression of Imagin Orthologs Between Two Decapod Suborders
To explore the tissue distribution and functions of Imagin orthologs in other decapods, we turned to public RNA-seq data and reference genome assemblies (Table S11). In penaeid shrimps, Imagin expression was detected predominantly in the ovary (Fig. 6a), aligning with our qPCR data in P. japonicus. In contrast, in other decapod crustaceans, such as crabs and lobsters, Imagin showed high expression in the testis rather than ovary (Fig. 6b and c). The marked difference in tissue specificity between penaeid shrimp and other decapods implies a functional divergence of Imagin in reproductive processes across crustacean lineages. Collectively, these observations suggest that, although Imagin orthologs are essential for reproductive biology in decapods, their precise roles may vary among species.
TPM expression levels of Imagin across decapod species and tissues. Expression values are visualized using barplots overlaid with beeswarm plots to show both distribution and individual data points. Imagin expression is highest in the ovary of penaeid shrimp (a) but enriched in the testes of prawns and crayfish (b) as well as crabs (c), indicating tissue-specific roles across species. See Table S11 for the details of the datasets used.
Discussion
TEs are ubiquitous in cellular genomes, and crustaceans are no exception. In fact, crustacean genomes are notoriously rich in repetitive sequences, which poses challenges for bioinformaticians working on these organisms. Despite the abundance of TEs, their biological significance, including the contribution of domesticated elements, has been poorly understood in crustaceans. From a reverse genetics perspective, this could be due to the difficulty in distinguishing domesticated elements from “parasitic” or “junk” elements in genome sequences. This challenge is further compounded by the difficulty of applying forward genetics approaches to these nonmodel organisms.
The discovery of Imagin was a serendipitous byproduct of a previous shrimp genome sequencing project (Kawato et al. 2021). The bioinformatic analyses were driven by the availability of high-quality genome and transcriptome data, which allowed us to identify orthologs, including their conserved genomic context. Bioinformatics-driven discovery of domesticated TEs requires careful examination of conservation across taxa as well as avoiding prematurely masking domesticated genes as repetitive elements.
Imagin originates from an integrase, which is a family of DNA recombination enzymes responsible for the insertion of genetic elements into host genomes (Collis and Hall 1992; Masuda 2011). Integrases are characterized by conserved functional domains, such as the H2C2 zinc finger and the catalytic DDE motif (Nowotny 2009). The domestication of integrase by the host can give rise to novel biological functions (Miller et al. 1997; Chalopin et al. 2012; Koonin and Krupovic 2018). Importantly, when TEs are domesticated by the new host, crucial factors are the biochemical properties of the protein stemming from the structure, rather than the biological role(s) it played in the original context.
While Imagin orthologs are widely conserved across malacostracan crustaceans, they lack key DDE residues required for enzymatic activity, which suggests that it does not function as a canonical integrase. Regardless, the overall structural integrity of the integrase core domain, or in general, the RNase H-like fold (Majorek et al. 2014), is well preserved. This raises the possibility that Imagin acts as a nucleic acid-binding protein or a scaffolding protein mediating protein–protein interactions, likely cooperating with the N-terminal H2C2 zinc-finger domain that aids multimerization (Zheng et al. 1996; Lee et al. 1997), rather than functioning as an enzyme.
The accumulation of PjImagin in the cytoplasm of developing oocytes is consistent with this hypothesis. Cytoplasmic proteins in crustacean oocytes often participate in maternal mRNA regulation (Howley and Ho 2000), translational control (Richter and Lasko 2011; Sengseng et al. 2023), or yolk processing (Kessel 1968), all of which are essential for proper embryonic development. PjImagin's association with yolk bodies suggests a role in vitellogenesis (Tsukimura 2001) or in modulating genes related to gonadal maturation (Sellars et al. 2015; Potiyanadech et al. 2023). PjImagin may contribute to some of these functions; for instance, functioning as a carrier or storage platform for maternal RNAs, where high protein abundance is required to sequester or regulate nucleic acid targets. Examples of domesticated TEs serving as an RNA-binding protein include Jerky in mammals (Toth et al. 1995; Liu et al. 2002, 2003), although the RNA-binding domain of Jerky is a tandem repeat of homeodomain-like helix-turn-helix, which is different from Imagin.
While the maternal RNA-binding protein hypothesis could explain the Imagin expression dynamics in penaeid shrimps (Suborder Dendrobranchiata), Imagin showed highest expression in the testis of other decapods (Suborder Pleocyemata). This is rather intriguing and warrants further investigation, as it suggests the subfunctionalization of Imagin across the 2 suborders. The accelerated evolution observed in brachyuran Imagin orthologs (Fig. 2) may be associated with their translocation from the conserved MMUT intron, which likely released the gene from the strict evolutionary constraints imposed by the host gene.
Some domesticated TEs are responsible for controlling other TEs in the host genome. Although we cannot rule out the possibility that Imagin somehow contributes to the maintenance of host genome integrity, it is unlikely that Imagin is directly interacting with other TEs. If Imagin were an effector gene actively engaged in an “evolutionary arms race” against parasitic elements, its evolutionary trend should be characterized by diversification and divergence, which would involve gene duplication events and positive selection. Regardless, Imagin has remained a single-copy gene and, at least in decapods, has been under purifying selection. Even in brachyuran crabs, where Imagin has translocated from the conserved intron that could have potentially imposed constraints on duplication events, the gene has remained single copy. Given these observations, if Imagin plays a role in TE control, it would be a regulatory one like that of ALP1 in land plants, which has also been under purifying selection (Liang et al. 2015).
Despite deep conservation at the locus level, amphipod Imagin has undergone drastic sequence divergence compared to other lineages. Amphipod Imagin has completely lost the integrase core and terminates with a low-complexity repeat, which is predicted to be an intrinsically disordered region; even the YPYY and the H2C2 motifs on the N-terminal are degraded (Fig. 1). Although we can only speculate the functions of this divergent Imagin protein, its strict conservation as a protein-coding gene suggests this gene is essential in amphipods. Regardless of its biological roles, the structural plasticity of Imagin in different host lineages illustrates how a domesticated transposon can evolve to serve host-specific needs, once it has been liberated from ancestral functional constraints.
While Imagin is conserved across malacostracan crustaceans, it has undergone notable structural and regulatory divergence. While originally derived from a mobile element, Imagin appears to have been co-opted into reproductive processes in a lineage-specific manner. Further studies, including RNA interference experiments to investigate the functional role of PjImagin, as well as broader species comparisons, are needed to determine whether integrase-like elements contribute to crustacean reproduction and whether their domestication represents an evolutionary trend.
Materials and Methods
Experimental Animals
Kuruma shrimp (average body weight: 15 g) were obtained from a commercial shrimp farm in Okinawa, Japan. Shrimp were maintained in a tank with a water recirculating system at 25 °C and a salinity of 29 to 32 ppt. Shrimp were acclimated for 3 d prior to downstream experiments.
Cloning of the Full-length PjImagin cDNA
Total RNA was extracted from the ovaries of kuruma shrimp using RNAiso Plus reagent (Takara Bio Inc., Japan). The full-length cDNA sequence was cloned by 5′- and 3′-rapid amplification of cDNA ends (RACE)-PCR using the SMARTer RACE 5′/3′ Kit (Clontech, Japan), following the manufacturer's protocol. The primers used in this study are found in Table 3.
Characterization of the PjImagin Transcript
A PjImagin transcript sequence (ICRK01003983.1) was identified from a P. japonicus transcriptome assembly (Kawato et al. 2021). The sequence was analyzed with SnapGene Viewer v. 5.3.2 (GSL Biotech, United States) and GENETYX version 11.0.4 (Software Development Co. Ltd, Japan). PjImagin orthologs in other decapod crustaceans were identified using NCBI BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
Functional domains in the predicted protein sequences were predicted using HMMER (Eddy 2011) (https://www.ebi.ac.uk/Tools/hmmer/). The molecular size and theoretical isoelectric point of the protein were predicted using the ExPASy (Duvaud et al. 2021) Compute pI/Mw tool (https://web.expasy.org/compute_pi/).
RNA Isolation and cDNA Synthesis
Gill, heart, epidermis, stomach, lymphoid organ, nerve, muscle, hepatopancreas, eye stalk, intestine, and gonads were collected from 6 apparently healthy shrimp. Total RNA was extracted using RNAiso Plus (Takara Bio Inc., Japan) following the manufacturer's instructions, precipitated with isopropanol, and resuspended in DEPC-treated distilled water. RNA concentration was measured by spectrophotometry (NanoDrop, Thermo Scientific, United States). cDNA synthesis was carried out using the High-Capacity cDNA Reverse Transcription Kit with RNase Inhibitor (Applied Biosystems, United States). The cDNA was stored at −20 °C until further use.
Tissue Distribution Analysis by RT-qPCR
Before qPCR, the cDNA was further diluted to a final concentration of 10 ng/µL with nuclease-free distilled water (DW). Quantitative PCR was performed on StepOnePlus Real-Time PCR System (Applied Biosystems, United States). Each reaction was performed in a final volume of 20 μl containing 5 μl of cDNA sample as a template, 10 μl of THUNDERBIRD Next SYBR qPCR Mix (TOYOBO, Japan), 3.8 μl of nuclease-free water, and 0.6 μl each of sense primer and antisense primer (10 pM). The cycling program was initiated from 95 °C (5 min) for preincubation, followed by 40 cycles at 95 °C (5 s) and 60 °C (30 s). The relative expression of the PjImagin was calculated by the 2^−ΔΔCt^ comparative Ct method.
Recombinant Protein Expression and Antibody Preparation
A partial coding sequence (CDS) of PjImagin (19.7 kDa; rPjImagin) was cloned into the pET-32a(+) expression vector and introduced into E. coli BL21(DE3) cells. rPjImagin expression was induced with IPTG, and the N-terminal 6×His-tagged recombinant protein was purified using Ni-NTA Agarose (QIAGEN, Germany) following the manufacturer's instructions.
The purified protein was used to generate a polyclonal antibody in rabbits. The specificity of the antibody was confirmed by Western blot analysis, in which the antibody specifically recognized the rPjImagin protein (data not shown).
Immunohistochemistry
Kuruma shrimp (P. japonicus) with body weights of approximately 10 and 25 g were used. Shrimp were injected with Bouin's solution (Fujifilm, Japan) and then immersed in the same fixative at 4 °C for 24 h. Tissues were dehydrated through a graded ethanol series and embedded in paraffin (Bell and Lightner 1988). Longitudinal sections of the cephalothorax were cut at a thickness of 4 μm and mounted onto slides for immunohistochemical analysis.
The sections were rehydrated and treated with 3% hydrogen peroxide in methanol for 10 min at room temperature to block endogenous peroxidase activity. Antigen retrieval was carried out in a water bath at 90 °C for 20 min using HistoVT One (Ichida et al. 2021) (NACALAI Tesque Inc., Japan). After treatment, the slides were blocked with Block Ace (KAC Co., LTD, Japan) for 1 h at room temperature, then incubated with anti-rPjImagin rabbit polyclonal antibody (1:5,000 dilution in Can Get Signal Solution A; TOYOBO, Japan) at 4 °C overnight. The sections were washed 3 times for 5 min each with PBST.
Following incubation, the sections were treated with a ready-to-use Peroxidase Polymer Anti-Rabbit IgG reagent (VECTOR LABORATORIES, United States) for 30 min at room temperature. After the immunoreactions, the sections were washed 3 times with PBST (0.05% Tween-20 in PBS) for 5 min each. Signal development was performed using 3,3′-diaminobenzidine (DAB; VECTOR LABORATORIES, United States) as the chromogen, which produces a brown precipitate at sites of positive immunoreactivity. The sections were then counterstained with hematoxylin, resulting in blue coloration of nuclei and negative signals, mounted with Malinol (Muto Pure Chemicals Co., Ltd., viscosity: 750 cps), and observed under a BZ-X810 microscope (Keyence, Japan).
Bioinformatic Analyses
See Supplementary Data S7 for the scripts used in this study.
Expression Profiling Using Public RNA-seq Data
The reference genome assemblies of 9 decapod crustacean genomes (Table S11) were downloaded from the NCBI database (Zhang et al. 2019a; Tang et al. 2020; Jin et al. 2021; Kawato et al. 2021; Uengwetwanit et al. 2021; Xu et al. 2021; Liao et al. 2024; Liu et al. 2024; Zhang et al. 2024; Wang et al. 2024a). RNA sequencing (RNA-seq) reads from various organs (e.g. heart, liver, etc.) were retrieved from the NCBI Sequence Read Archive (SRA) (Jiang et al. 2014; Shen et al. 2014; Manfrin et al. 2015; Peng et al. 2015; Huerlimann et al. 2018; Tinwongger et al. 2019; Nong et al. 2020; Santos et al. 2020; Tang et al. 2020; Kawato et al. 2021; Jia et al. 2022; Li et al. 2022; Yang et al. 2022; Zhang et al. 2022a, 2022b; Hu et al. 2023; Jia et al. 2023; Ling et al. 2023; Smith et al. 2023; Sun et al. 2023; Wang et al. 2023; Chang et al. 2024; Jiang et al. 2024; Li et al. 2024; Liao et al. 2024; Wang et al. 2024b; Chen et al. 2025; Li et al. 2025; Wang et al. 2025; Xu et al. 2025; Zhang et al. 2025b) (Table S11). The raw RNA-seq data underwent preprocessing with fastp (Chen et al. 2018) to remove low-quality reads and adapter sequences. After the quality control, the cleaned reads were mapped to the reference genome using STAR (Dobin et al. 2013), and gene expression levels were quantified with RSEM (Li and Dewey 2011). Transcripts per million (TPM) values were calculated to assess gene expression across different tissues.
Exploration of Imagin Orthologs in Malacostracan Genomes
Genomic scaffolds containing Imagin or MMUT genes were explored by querying crustacean Imagin and MMUT proteins against the NCBI WGS database (last accessed 2025 November). The query sequences varied depending on the strategy and the target. The selected scaffolds were downloaded, and malacostracan Imagin and MMUT proteins were aligned onto the scaffolds using miniprot v0.18-r281 (Li 2023). The resulting GFF3 files were parsed using gffread v0.12.7 (Pertea and Pertea 2020). Sequence integrity and completeness were analyzed visually using IGV (Robinson et al. 2011). Structural features were predicted using InterProScan v5.76-107.0 (Jones et al. 2014).
The identification of Imagin orthologs in isopods required careful examination because isopod Imagin sequences were substantially divergent from those of decapods. The Imagin ortholog from the deep-sea giant isopod Bathynomus jamesi (Yuan et al. 2022) was identified by TBLASTN search querying the PjImagin protein against the B. jamesi genome (Table 1; Table S4). Positive hits on scaffold JAJOZX010000594.1 exhibited the highest query coverage (63%) despite low amino acid identity (27.55%) and, as expected, were found nested between exons 14 and 15 of the predicted MMUT gene (Fig. 1b). The corresponding region (JAJOZX010000594.1:1094001-1104000) was extracted, and gene structures were predicted using Augustus v3.3.3 (https://bioinf.uni-greifswald.de/augustus/submission.php; Last accessed 2025 November) (Stanke et al. 2006) using the species model of Bombus terrestris, yielding a predicted Imagin-like gene (Supplementary Data S5). TBLASTN search querying the predicted protein against the B. jamesi genome assembly yielded a single strong hit, strongly suggesting that this locus is single copy (Table S5). The orthology of the B. jamesi Imagin-like gene with decapod Imagin was finally justified based on the phylogenetic analysis (Fig. 2). The predicted B. jamesi Imagin protein was further queried against isopod genome assemblies to identify Imagin orthologs in Armadillidium vulgare (Chebbi et al. 2019) and Armadillidium nasatum (Becking et al. 2019), which also lay between exons 14 and 15 of MMUT gene. The gene structure of A. vulgare Imagin was predicted using Augustus v3.3.3 (Supplementary Data S5), and the predicted A. vulgare Imagin protein sequence was mapped to the A. nasatum genome assembly using miniprot to locate the A. nasatum Imagin gene. A. vulgare Imagin was unambiguously a single-copy gene (Table S6), whereas TBLASTN search querying the predicted A. nasatum Imagin against the A. nasatum genome assembly yielded 2 other hits exhibiting 98% to 99% identity with 81% coverage (Table S7). To examine the possibility that this gene is an active Ginger1-like element, we compared the 3 relevant scaffolds (SEYY01021081.1, SEYY01003597.1, SEYY01017486.1) on the YASS web interface (https://bioinfo.univ-lille.fr/yass/; last accessed 2025 November) (Noé and Kucherov 2005). It turned out that the duplicated Imagin sequences were part of segmental duplications ranging from 7 to 11 kb, where duplicated sequences contained exons 10 to 14 of MMUT gene. No inverted repeats were observed in the neighborhood. These observations suggest the duplicated Imagin fragments in the A. nasatum genome assembly are not associated with the transposition activity of Imagin, if exist at all.
Additional isopod Imagin sequences were recovered from transcriptome shotgun assemblies (Jovović et al. 2024) (Table 1). The isopod Imagin proteins have a truncated C-terminal compared to those of decapods.
Imagin orthologs in amphipods could not be detected by TBLASTN search querying decapod or isopod Imagin protein sequences. However, the RefSeq annotation of the Hyalella azteca genome has a protein-coding gene (LOC108680262) between exons 14 and 15 of MMUT gene. Protein isoforms encoded by LOC108680262 have inconsistent names (“titin isoform X1” and “glutenin, low molecular weight subunit 1D1 isoform X2”), likely due to definitive functional domains and repetitive motifs toward the C-terminal. A TBLASTN search querying the protein sequence (XP_018024546.1) against the genome assembly returned no evidence of closely related homologs, further corroborating the view that this gene is single-copy (Table S8). XP_018024546.1 protein sequence was queried against genome and transcriptome shotgun assemblies of other amphipods, recovering additional 3 homologs (Table 1).
Phylogenetic Analysis
The predicted amino acid sequences of malacostracan Imagin orthologs were aligned using MAFFT v7.525. The multiple sequence alignment was used for the maximum-likelihood phylogenetic analysis using IQ-TREE v2.3.6 (Minh et al. 2020). The resulting tree was visualized using FigTree v1.4.4.
A total of 97 representative DDE integrases and transposases were downloaded from the NCBI database (Table S9; Supplementary Data S6). The full-length protein sequences of Ginger1 elements were unavailable from (Bao et al. 2010). Therefore, we substituted Ginger1 elements with entries recovered as BLASTP hits showing >85% amino acid identity to the original sequences available in the Supplementary Data of Bao et al. (2010). The protein sequences were aligned by MAFFT v7.525 (Katoh et al. 2019), trimmed with trimAl v1.5.0 with “-automated1” option (Capella-Gutiérrez et al. 2009), and phylogenetic analysis was conducted with IQ-TREE v3.0.1 (Minh et al. 2020). The resulting tree was visualized using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
Structural Alignment
The Imagin protein structures were predicted using ColabFold v1.5.5 (Mirdita et al. 2022) (Supplementary Data S1). The multiple structural alignment of the Imagin proteins and representative integrases was generated by the “all against all structure comparison” workflow implemented in the DALI server (http://ekhidna2.biocenter.helsinki.fi/dali/; accessed November 2025) (Holm 2020).
dN/dS Analysis
Selected Imagin CDS were retrieved or extracted from transcriptome or whole genome shotgun assemblies from the NCBI database (Table S10, Supplementary Data S4). The CDS were aligned using MAFFT v7.525, and the resulting alignment was used to build a maximum-likelihood phylogenetic tree using IQ-TREE v3.0.1. The alignment and the tree were input to HyPhy v2.5.78, using the FitMG94.bf batch file (https://github.com/veg/hyphy/issues/1573).
Supplementary Material
evag010_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adachi H, Moritoki N, Shindo T, Arakawa K. Post-embryonic tail development through molting of the freshwater shrimp Neocaridina denticulata. i Science. 2025:28:111885. 10.1016/j.isci.2025.111885.40051830 PMC 11883442 · doi ↗ · pubmed ↗
- 2Angst P, Dexter E, Stillman JH. Genome assemblies of two species of porcelain crab, Petrolisthes cinctipes and Petrolisthes manimaculis (Anomura: Porcellanidae). G 3 Genes Genomes Genetics. 2024:14:jkad 281. 10.1093/g 3journal/jkad 281.PMC 1084936638079165 · doi ↗ · pubmed ↗
- 3Austin CM, Croft LJ, Grandjean F, Gan HM. The NGS magic pudding: a nanopore-led long-read genome assembly for the commercial Australian freshwater Crayfish, cherax destructor. Front Genet. 2022:12:695763. 10.3389/fgene.2021.695763.35126445 PMC 8807398 · doi ↗ · pubmed ↗
- 4Baeza JA, Baker A, Childress M, Pirro S. Nuclear and mitochondrial genome datasets for spiny lobsters genus Panulirus (Decapoda: Achelata: Palinuridae). Data Brief. 2024:55:110588. 10.1016/j.dib.2024.110588.38974010 PMC 11225021 · doi ↗ · pubmed ↗
- 5Bao W, Kapitonov VV, Jurka J. Ginger DNA transposons in eukaryotes and their evolutionary relationships with long terminal repeat retrotransposons. Mob DNA. 2010:1:3. 10.1186/1759-8753-1-3.20226081 PMC 2836005 · doi ↗ · pubmed ↗
- 6Becking T, et al Sex chromosomes control vertical transmission of feminizing Wolbachia symbionts in an isopod. PLOS Biol. 2019:17:e 3000438. 10.1371/journal.pbio.3000438.31600190 PMC 6805007 · doi ↗ · pubmed ↗
- 7Bell TA, Lightner DV. A handbook of normal penaeid shrimp histology. World Aquaculture Society; 1988.
- 8Bernot JP, et al Major revisions in pancrustacean phylogeny and evidence of sensitivity to taxon sampling. Mol Biol Evol. 2023:40:msad 175. 10.1093/molbev/msad 175.37552897 PMC 10414812 · doi ↗ · pubmed ↗
