The environmental adaptation of acidophilic archaea: promotion of horizontal gene transfer by genomic islands
Jingxuan Qiu, Huiling Tao, Hongyu Li, Xinyi Liu, Rui Liu, Muhammed Naveed Nawaz, Xingjie Wang, Liyuan Ma

TL;DR
This study explores how acid-loving archaea adapt to harsh environments through horizontal gene transfer, focusing on genomic islands and their role in survival.
Contribution
The study identifies and analyzes 176 genomic islands in acidophilic archaea, revealing their structural and functional roles in environmental adaptation.
Findings
Genomic islands in acidophilic archaea show random size and distribution, with lower GC content compared to the genome average.
tRNAs with stem-loop structures are commonly found at GI ends, suggesting integration near tRNA sites.
Genomic islands often carry genes related to genetic processes, metabolism, and toxin-antitoxin systems, aiding adaptation to acidic environments.
Abstract
Acid mine drainage (AMD) is an extremely acidic leachate highly contaminated with metal ions, yet it harbors a significantly high abundance of archaea. Genomic islands (GIs), as one of the productions of horizontal gene transfer (HGT), play an important role in the environmental adaptation and evolutionary processes of archaea. However, the distribution, structure, and function of GI within the genomes of archaea remain poorly understood. In this study, through the bioinformatic analysis of archaea in AMD, including Ferroplasma acidiphilum ZJ isolated from laboratory and 25 acidophilic archaea collected from NCBI database, 176 GIs were predicted and annotated. Furthermore, we analyzed their structural features and provided insights into the role of HGT in environmental adaptation. The size and distribution of GIs in the genomes were found to be random. In the majority of GIs, the GC…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —https://doi.org/10.13039/501100001809National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetal Extraction and Bioleaching · Genomics and Phylogenetic Studies · Microbial Community Ecology and Physiology
Introduction
Previous phylogenetic studies have revealed that life on Earth consists of three primary domains, i.e., bacteria, archaea, and eukarya [1, 2]. Nevertheless, with the discovery of the Asgard superphylum (including Loki-. Thor-, Odin- and Heimdallarchaeota) in the past few years [3, 4], James Lake’s two-domain theory which includes only bacteria and archaea has gained more and more support [5]. Archaea, the third form of life, provides valuable insights into the early stages of evolution. The cell morphology and structure of archaea are similar to that of bacteria, but the genetic information transfer systems such as genome replication, transcription, and translation are closer to those of eukaryotes [6]. These organisms thrive in a variety of habitats, especially extreme environments, which show tremendous diversity in genetics and metabolism and drive the cycling of elements on Earth. Archaea not only colonize diverse environments ranging from the human gut) to agricultural soils [7, 8], but uniquely retain dominance in extreme habitats. Acid mine drainage (AMD) is an exceptionally acidic and highly contaminated leachate generated from underground operations of closed or abandoned mine sites and the accumulation of tailings or mullocks [9]. It has been demonstrated that at pH < 2, the relative abundance of archaea in AMD systems increases to 25% of all prokaryotes belonging to this domain [10]. However, our understanding of archaeal biology remains far less than that of bacteria, primarily due to the challenges in isolating and culturing these microorganisms. Some researchers have engineered and operated a methane-fed, continuous-flow bioreactor system for over 2,000 days to enrich archaea from deep-marine methane-seep sediments [11].
The adaptive mechanisms underlying survival in extreme acidic environments remain a persistent focus in extremophile research, particularly regarding evolutionary pathways. A substantial proportion of bacterial and archaeal genetic diversity originates from sequence acquisitions from other environmental microbes [12].
Horizontal gene transfer (HGT), also referred as lateral gene transfer, is defined as the movement of genetic material between distinct species or across substantial phylogenetic distances by non-vertical inheritance mechanisms [13]. HGT is neutral and, in principle, any gene may be transferred via HGT. Such transfers can confer advantageous traits related to ecological niche adaptation or pathogenicity but may also introduce numerous non-beneficial genes, imposing a metabolic burden on the recipient microorganisms [14–16]. Mobile genetic elements (MGEs) act as key vectors that facilitate the frequency and genomic scope of HGT. Through specialized mechanisms such as conjugation, transduction, and transposition, MGEs mediate the transfer of adaptive traits (e.g., antibiotic resistance) while simultaneously propagating themselves across phylogenetically distant taxa [17]. MGEs offer several benefits to microorganisms. However, the maintenance of MGEs imposes a considerable fitness cost on hosts [18, 19].
In 1990, researchers identified clusters of virulence genes present in the genomes of certain Escherichia coli strains. These clusters, recognized as a type of MGEs, were named genomic islands (GIs) [20]. It has been reported that GIs initially enter the new strain through HGT before being integrated into the host chromosome via site-specific recombination. Following integration, GIs may evolve through processes such as gene rearrangement, and the loss or acquisition of new mobile genes [21]. GIs have special sequences and structural features that set them apart from the rest of genomes: sporadic distribution, instability, ability to excise spontaneously, sequence composition bias, atypical codon usage, large size, proximity to tRNA genes, and flanking direct repeats (DRs) [22]. However, not all genomic regions that meet the above characteristics can be called GIs, but the more criteria a sequence meets, the more likely it is a product of HGT. Moreover, GIs carry genes that can promote the adaptation of their host to its specific ecological niches, such as pathogenesis, symbiosis, novel metabolic pathways, and resistance to antibiotics, prophages, or heavy metal cations [23].
To better understand the environmental adaptability of archaea in AMD, the GIs of an archaeal Ferroplasma acidiphilum ZJ isolated from AMD of Zijin Mine and 25 other acidophilic archaeal strains collected from NCBI were predicted. Furthermore, we analyzed the GIs’ structure to investigate the correlation between structure and mobility. Additionally, the function of genes in the GIs was also annotated to investigate their positive impacts on the host’s adaptation to the environment.
Materials and methods
Data collection
The isolation process involved inoculating 10.0 mL aliquots of liquid samples into modified 9 K medium (pH 1.6) supplemented with 22.4 g/L FeSO_4_·7H2O and 0.02% (w/v) yeast extract. Cultivation was conducted at 40 °C under aerobic conditions using a rotary shaker at 170 rpm. The base 9 K medium composition consisted of (per liter): (NH_4_)2_SO_4 (3.00), K_2_HPO_4_ (0.50), KCl (0.10), Ca(NO_3_)2 (0.01), and MgSO_4_·7H_2_O (0.50). The DNA of F. acidiphilum ZJ was sequenced with a combination of PacBio RS and Illumina HiSeq platform supported by Shanghai Majorbio Bio-pharm Technology Co., Ltd. (Shanghai, China).
Previous studies have investigated 25 strains of archaea that have been identified in AMD. They belong to 6 genera including Thermoplasmatales (6), Ferroplasma (6), Acidiplasma (5), Candidatus Parvarchaeota (3), Candidatus Mancarchaeum (1), Cuniculiplasma (2), Thermogymnomonas (2) (Table S1). For this study, the data, including genomic sequences, genomic size, accession number, isolation source, genomic status, and GC content of 25 strains mentioned above, were downloaded from the NCBI (https://www.ncbi.nlm.nih.gov/). The phylogenetic tree of the above 26 strains of acidophilic archaea was shown in the Figure S1.
GIs identification
A wide array of methods has been developed to predict and visualize GIs, the majority of which depend on recognizing the structural features and nucleotide composition of GIs. First published in 2009, Island Viewer was the inaugural web server to integrate four of the most accurate and complementary GI prediction tools: IslandPick, IslandPath-DIMOB, SIGI-HMM, and Islander [24]. IslandPath-DIMOB utilizes nucleotide bias and the presence of mobility genes to identify GIs, while SIGI-HMM employs a Hidden Markov Model approach to address codon usage bias. IslandPick, in contrast, adopts a comparative genomic methodology to pinpoint GIs. Researchers compared IslandViewer4 with other GI prediction tools and found that it has attracted much interest because of its high accuracy and specificity [25]. Thus, this study used IslandViewer4 to predict the potential GIs. The reference genome for GI prediction was Ferroplasma acidarmanus fer1. The virulence factor homologs in GIs were identified in close relatives of genomes with curated data based on a reciprocal best blast hit (RBBH) approach with very stringent cutoff values: e-value cutoff of 1e-10, > 90% sequence similarity, and > 80% coverage. Newly added predictions from Islander based on the frequent use of tRNA and tmRNA genes as integration sites [26].
Structure of GIs
The sequences of GIs were obtained from IslandViewer4. The GC Content Calculator, a web-based tool, was utilized to calculate the exact percentage of GC content in predicted GIs. The tRNA genes near the predicted GIs were identified with tRNAscanSE2.0 [27]. The archaeal genomes were selected as the sequence source and the search mode was set at default.
RNAfold was used to predict the secondary structure of tRNA. The multiple sequence alignment (MSA) of flanked tRNA was analyzed by MUSCLE [28]. The fold algorithms were minimum free energy (MFE) and partition function. The basic option was avoiding isolated base pairs. The dangling end option was dangling energies on both sides of a helix in any case. The energy parameters were RNA parameters (Turner model, 2004).
Function of GIs
Genes within the predicted GIs were annotated using EggNOG5.0, which classifies genes into three taxonomic levels; the bacterial subset of COGs, archaeal arCOGs and eukaryotic KOGs. The input sequence type was protein and other parameters were default. The COG system categorizes the data into 26 functional groups. Several of these categories describe functions primarily associated with eukaryotic cells. The recently added V (Defense mechanisms) and X (Mobilome) categories provide a more detailed description of the dynamics of bacterial and archaeal genomes. Functional categories are assigned by the cellular roles of the respective COGs [29].
Results and discussion
Distribution and quantity of GIs in AMD archaea genomes
One hundred thirty-one genes total amongst 5 GIs were predicted in F. acidiphilum ZJ. A total of 171 GIs were identified across the genomes of other 25 acidophilic archaea. Among these 26 archaea, two strains, namely Mancarchaeum acidiphilum MIA14 and Candidatus Parvarchaeota archaeon TL1-5-bins.178, were identified by IslandPath-DIMOB, and the rest were identified using SIGI-HMM (Table S2). The length of GIs ranged from 4,215 bp to 91,819 bp, with 71.3% falling between 8 kb and 40 kb. The positions of GIs were random in most strains’ genomes (Fig. 1a). The significant differences in the distribution and number of GIs in the archaeal genomes may be due to the large differences in the size and sequencing quality of the 26 archaeal genomes. It can be noted that IslandViewer4 defaulted to splicing multiple contigs with N in the process of GI prediction so that the GIs of several strains extended beyond the end of their genomes. The lowest had only one GI, while the highest had 18 GIs among 25 genomes. The total number of genes contained in the GIs varied among genomes too. The genome of TL1-5-bins.178, with only a single GI, contained the fewest genes within its GIs, numbering only 8. In contrast, a total of 322 genes were annotated in Thermoplasmatales archaeon A-plasm’s GIs (Fig. 1b).Fig. 1a Distribution map of GIs in genome. b The number of GIs in each strain and the number of genes contained in GIs of each strain
The analysis between the number of GIs and the size of all GIs in every single genome revealed a Pearson correlation coefficient of 0.87 (p-value < 0.001), indicating a strong positive linear correlation (Fig. 2a). Moreover, linear regression analysis also confirmed a positive linear relationship between the genome size and the number of all GIs or size of GIs (Fig. 2b and c). However, compared to archaea with the same number of GIs, ZJ had the highest total size of GIs (Fig. 2a). F. acidiphilum ZJ had a lower number of GIs than archaea with similar genome sizes (Fig. 2b). This is consistent with the “genome expansion hypothesis” proposed by McDaniel et al. that HGT increases both total genome length and GIs accumulation [30]. Among the three major genera, Acidiplasma exhibited a lower ratio of the total GIs sequence size and host genome size (5.05%), while the genera Ferroplasma and Thermoplasmatales had about 9.04% and 8.74%, respectively (Fig. 2d). Archaea of the genus Ferroplasma exhibited a higher efficiency of GIs integration, suggesting a specific role of genomic plasticity in heavy metal stress adaptation. This discrepancy may be attributed to the inferior genome assembly of Acidiplasma (Table S1). When genome assemblies are fragmented, larger genomic islands may be cut into multiple short contigs, resulting in a prediction algorithm that fails to recognize the complete boundaries, leading to a low predicted GI number. Previous research has also found that the number of GIs varied in different strains of the same species. For example, Aeromonas hydrophila ATCC 7966 only contained found that 13 GIs while there were 33 GIs in Aeromonas hydrophila NJ-35 [31]. These variations may result from the spontaneous cleavage of GIs, and some GIs were even lost after being cultured in the laboratory [32].Fig. 2. Patterns of GIs in AMD archaea genomes. a Linear regression analysis between number of GIs per archaea genome with GI size (in kb). b Relationship between genome size and number of GIs. c Relationship between archaea genome and GI size. And d Box and whiskers graphic of the GI ratio for the AMD archaea
The secondary structure and multiple sequence alignment of tRNAs flanking GIs
Transfer RNAs (tRNAs) are crucial molecular adaptors that mediate the translation of genetic code. These molecules undergo a variety of post-transcriptional modifications, which enhance their chemical reactivity while simultaneously influencing their structure, stability, and functionality. The tRNA is usually composed of 70–90 nucleotides, and its 3’ end serves as the site for amino acid attachment. As described above, GIs scattered across the archaeal chromosomes. Several studies have indicated that they were often integrated at the 3’ end of tRNA and formed direct repeat sequences (DRs) [23, 33, 34]. The sequence of tRNA is highly conserved, resulting in a low mutation rate during HGT. The secondary structure of tRNA consists of a cloverleaf, and this more stable structure also ensures the low mutability of tRNA to a certain extent [35]. There were two main reasons why tRNA can become the integrase selection site: tRNA was more reliable than other coding genes, and the other was the small size of tRNA facilitates the host genome’s recovery of target genes upon integration [36].
tRNA flanking sequences were observed in archaeal GIs, a hallmark of horizontal gene transfer. Notably, 23 of the 171 predicted GIs (13.5%) exhibited this characteristic (Table S3). The proportion of GIs flanked by tRNA relative to the total number of tRNA in these archaea genomes was 7.3%. The size of tRNAs ranged from 70 to 142 bp. 14 genomes with the flanked tRNA correspond to 9 species of 6 genera. The predicted tRNAs exhibited a high degree of primary and secondary structure diversity (Fig. 3a-f, Table S3 and Fig. S2). All predicted tRNA had the typical cloverleaf secondary structure. The anticodons of these tRNAs correspond to 10 different amino acids, including alanine (Ala), arginine (Arg), cystine (Cys), glycine (Gly), leucine (Leu), isoleucine (Ile), methionine (Met), threonine (Thr), tryptophan (Trp) and valine (Val). Among these, tRNA^Leu^ and tRNA^Val^ accounted for the largest proportion of anticodons. A marginal histogram explored the relationship between GI’s size and GC content and flanked tRNA’s existence. The result indicated that GIs with 5–15% GC content and a size of 30–45 kb had a higher probability of carrying flanked tRNA (Fig. 3g). GIs with low GC content may be retained by the host by maintaining gene expression compatibility through the conservation of flanking tRNAs (e.g., promoters or terminators) when integrating into high GC hosts. Previous studies have pointed out that horizontally transferred DNA often has a different GC content from the host, and flanking conserved sequences (e.g., tRNAs) may buffer transcriptional conflicts arising from such differences [37]. 30–45 kb sized GIs are moderate in size, accommodating a larger cluster of functional genes, and also avoid being eliminated by the host due to the metabolic burden of redundant genes [21].Fig. 3a-f Secondary structure of flanked tRNAs. g Marginal histogram of GI size with flanked tRNA and its GC content
Multiple sequence alignment (MSA) provides critical insights into sequence-structure-function relationships within nucleotide or protein sequence families [38]. The findings revealed that the homology of downstream sequences and the similarity of flanking tRNA sequences were greater than those observed upstream (Fig. 4). Additionally, the same tRNA with identical nucleotide sequences and secondary structures may be found in different GIs, such as the same tRNA^Leu^ whose anticodon was TAA at the 2nd GI of Acidiplasma sp. MBA-1 accurate_c1 and the 3rd GI of Acidiplasma cupricumulans BH2. In addition, the same tRNA^Val^ whose anticodon was GAC at the 2nd GI of Cuniculiplasma divulgatum S5(T) and 13th GI of Thermoplasmatales archaeon B-DKE.Fig. 4. Multiple sequence alignment among the flanked tRNA sequences
GC characterization of GIs and its comparison with the average GC of the host genome
GC content refers to the proportion or percentage of guanine (G) and cytosine (C) base pairs in a genome. Given its fundamental importance for the maintenance and transfer of genetic information, the diversity of GC content and its evolutionary determinants have been extensively studied over several decades [39]. GC content contributes to the stability of nucleic acids through hydrogen bonding, with increased hydrogen bonds generally leading to greater stability [40]. The average GC content was similar among strains of the same genus (Fig. 5). The GC content of F. acidiphilum ZJ genome was 36.9%, and that of the five Gis was 37.1%, 32.3%, 32.9%, 37.0% and 38.9%, respectively. Among the five genera investigated, the average GC content of Acidiplasma was the lowest (33.9 − 34.4%), while the average GC content of Thermogymnomonas was the highest (55.9 − 57.0%). GC content also varied among strains from 36.5 to 36.9% of Ferroplasma, from 37.2 to 37.3% of Cuniculiplasma, and from 44.3 to 56.4% of Thermoplastatales except for T. archaeon G-plasma whose average GC content was lower than other strains.Fig. 5. Comparison of average GC content of Genome and GC content of each GI
Compared to the average GC content of the host genome, 75.7% GIs exhibited a lower GC content (Fig. 5). This was consistent with the result of previous researchers [41]. For example, the GC content of PBGI-1 and PBGI-2 was lower than that of Pseudomonas bharatica CSV86 [41]. Similarly, Pongchaikul observed that the GC content of AcGI 1 was 61.5%, which was lower than 67.5% of the host genome [42]. GIs, as results of HGT, exhibit reduced stability in archaeal genomes due to their low GC content, which may facilitate integrase-mediated excision for either biased genomic retention or HGT-driven transfer to new hosts. The low GC content of GIs in the host genome results in their lower stability than other regions, which makes them more susceptible to integrase shearing, allowing them to either be biased in the host genome or to be sheared and then removed from the original host and transferred via horizontal gene transfer into the genome of the next host.
Classification and annotation of genes in GI via COG
37% of all genes on GIs are uncharacterized and part of the mobilome. These unwanted genes cause redundancy in the host genome and increase the metabolic and reproductive burden. A total of 63% of the island genes (673) were annotated using the COG classification. Among these, the majority (11.9%) were associated with replication, recombination, and repair (L), followed by amino acid transport and metabolism (E, 10.1%), carbohydrate transport and metabolism (G, 9.8%), and energy generation and conversion (C, 3.9%) (Fig. 6). Among the GIs of F. acidiphilum ZJ, the most functional genes annotated were category L, followed by E and M (cell wall/membrane/envelop biogenesis). Previous studies have indicated that categories C, E, J (translation, ribosomal structure, and biogenesis), along with category L, represent the predominant functional categories in the GIs of acidophilic bacteria [43]. The above four categories belonged to two groups, metabolism and information storage and processing. According to the overall distribution analysis, the majority of the genes were in these two groups, while the genes belonging to the group of cell life processes and signal transduction were quite few. These results indicated that in contrast to genes related to cellular life processes, the GIs were more inclined to carry genes related to genetic information processing and metabolism.Fig. 6COG classification of genes in GIs. Category J, K and L belong to information storage and processing; category D, M, N, O, T, U, Z, and C belong to cellular processes and signaling; category E, F, G, H, I, P and Q belong to metabolism
In addition to the above categories, there were two categories without characteristics: R (general function prediction only) and S (function unknown). They reflected the current level of interpretation of protein functions at the proteome level [44]. The results revealed the absence of the R category in the GIs, but the number of S category proteins accounted for 48.6% of the total. This indicated that the nearly half of the genes in the GIs remained uncharacterized, and the structure and function of many proteins were still unclear and needed further exploration and research.
GIs related to enhance stress resistance
Since all the material strains in this study are derived from AMD, which is characterized by a high concentration of heavy metals, the GIs with genes related to heavy metal and toxin resistance were studied further. The physical maps of genes in GIs are presented in Fig. 7 and Fig. S3. These GIs contained genes encoding proteins involved in heavy metal metabolism, including Copper/silver-transporting P-type ATPase, YHS domain copper/silver-binding protein, Sulfocyanin, Chromate reductase Class I flavoprotein, Metallochaperone, Rusticyanin, Mercuric ion binding protein, Mercuric reductase, 4Fe-4 S ferredoxin and Heavy-metal-binding membrane protein. CopA, which specifically recognized and confers resistance to Ag(I) and Cu(I) ions, was one of the few known members of the heavy-metal efflux resistance-nodulation-cell division (HME-RND) family [44]. The proteome was composed of CopZ (a cytoplasmic protein that can bind Cu^+^ and deliver it to CopA), CopA, and CusF (can receive Cu^+^ delivered by CopA and transfer it to the CusCFBA efflux system). The CusCFBA efflux system effectively expels Cu^+^ from the cells thus adapting to the high Cu^+^ environment [45]. The introduction of metal-binding proteins, which coordinated and bound functional groups to heavy metal ions through conformational changes, can enhance the metal-binding capacity, metal tolerance, and accumulation ability of the strain [46]. It indicated that CopA was important for heavy metal resistance in Mancarchaeum acidiphilum MIA14 inhabiting acidic environments with high concentrations of dissolved metal ions (Fig. 7c) [47]. Furthermore, it was reported that sulfocyanin in the 1th GI of Cuniculiplasma divulgatum S5(T) was one of the blue copper-containing proteins which probably played a crucial role in iron oxidation in some archaea and bacteria lineages [48] (Fig. S3).Fig. 7. Genetic physical map of GIs related to enhance stress resistance
MazF toxin, MazE antitoxin, RelE toxin, RelB antitoxin, VapC toxin, VapB antitoxin, and the toxin component PemK were proteins involved in toxin resistance. The MazFE, RelEB, and VapCB toxin-antitoxin (TA) were composed of the aforementioned proteins [49] (Fig. 7b, c and Fig. S3). TA systems were initially identified due to their role in plasmid stabilization, as bacterial cells that lose the plasmid following cell division experience either growth arrest or death. Subsequently, TA loci were shown to play a variety of roles, including bacterial cell adaptation to stress, phage resistance, and offering protection against superinfection when encoded within prophage genomes. TA systems commonly are composed of a stable toxin, which inhibits cellular growth, and a labile antitoxin, which counteracts toxicity [49–53]. The MazE antitoxin was directly bound to the MazF toxin and formed a protein-protein complex, resulting in its neutralization. However, MazE was a short-lived protein and would be degraded by proteases of the Clp family or by Lon under stress conditions. After the degradation of MazE, MazF was unleashed from the complex and acted as a sequence-specific endoribonuclease that cleaved RNAs [54–56]. For example, MazF has been reported to cleave the 3’ end of the 16 S ribosomal RNA, thus disrupting its binding to mRNA at the ribosome binding site and inhibiting translation initiation [49]. The RelBE system was recognized as a key component of the bacterial stress response, primarily regulating the global level of translation and function in the quality control of gene expression. The RelE toxin acted as an endoribonuclease, which cleaved ribosome-bound mRNAs between the second and the third base of the A-site codons, resulting in translational inhibition and bacterial cell growth stasis. The direct binding of the RelB antitoxin to RelE neutralized its activity by inducing a conformational change that disrupts toxin structure, thereby alleviating its toxic effects [53, 57, 58]. The VapC was a tRNA endonuclease, and VapB was known to bind to VapC to inhibit its endonuclease activity [59–61]. Under standard conditions, the expression levels of toxin and antitoxin were equivalent because the antitoxin counteracted the harmful effects of the toxin so that the strain could survive normally.
Additionally, a variety of mobile genes, such as integrase, transposase, and site-specific recombinase were identified in within multiple GIs (Fig. S3). These mobile genes were closely related to the mobility of MGEs and, their existence suggests that the GIs may have resulted from spontaneous HGT or the residue of other MGEs. The above three enzymes located within GIs played a crucial role in genomic evolution [62–67]. According to a previous study, site-directed recombination involved tyrosine/serine recombinase or DDE transposase to recognize the flanking repeats, leading to the exchange of DNA segments between integration/excision modules of different GIs [68]. There was also a group of Cas proteins in the 3rd GI of F. acidiphilum ZJ and 4th GI of F. acidiphilum Y, which, constituted a specific immune defense system known as the CRISPR/Cas system along with CRISPR (Fig. 7a and b). Cas2 and Cas4 genes were predominantly identified, arranged adjacent to CRISPR loci, and encode Cas proteins possessing nuclease, helicase, integrase, and other enzymatic activities. These proteins activities are critical for the recognition and degradation of foreign genetic material [69].
The presence of the above genes brought the host strains stronger metal metabolism and might helped them mitigate the impact of the heavy metal environment. They also enhanced the resistance of the host to the invasion of external substances, such as toxins and viruses. Because of the randomness of HGT, the genes located in the GIs exhibited considerable diversity. Whether they were GIs with genes related to metal element circulation or toxin resistance, the presence of these genes either expanded the host’s metabolic pathway or improved its metabolic efficiency. Ultimately, this process contributed to the enrichment of the genetic diversity of archaea, which highlights the ecological significance of HGT.
Conclusion
A total of 171 GIs were identified in the genomes of 25 acidophilic archaeal strains. The sizes of GIs ranged from 4,215 bp to 91,819 bp. The size and distribution position of the GIs in the genome did not have an obvious regularity. The GC content of most GIs was lower than the average GC content of the strain whole genome. The tRNA types at the end of GI showed obvious tRNA^Leu^ and tRNA^Val^ bias.
The COG annotation classification resulted in genes in the GIs displayed that among all the annotated genes, the number of replication, recombination, and repair categories was the largest. On the whole, the GIs were more inclined to carry genes related to genetic information and metabolism and less to carry functional genes closely related to cell life processes. Multiple genes related to iron oxidation reactions, mercury ion reduction, and copper metabolism pathways were found in multiple GIs. Additionally, the existence of multiple TA systems and mobile genes improved the adaptability of the strain to the extremely acidic mine environments.
Supplementary Information
Supplementary Material 1.
Supplementary Material 2.
Supplementary Material 3.
Supplementary Material 4.
Supplementary Material 5.
