Twenty-nine newly sequenced genomes and a comprehensive genome dataset for the insect endosymbiont Buchnera
Congcong Lu, Tianmin Zou, Qian Liu, Xiaolei Huang

TL;DR
This paper presents 29 new genomes of Buchnera, an insect endosymbiont, offering insights into how their genomes shrink over time.
Contribution
The study provides a comprehensive and diverse Buchnera genome dataset from 14 aphid subfamilies.
Findings
Buchnera genomes show significant genomic differences across aphid lineages.
The dataset includes a balanced range of genome sizes from 400 kb to 600 kb.
The data reveal insights into the microevolutionary processes of genome reduction in insect endosymbionts.
Abstract
Most phloem-feeding insects face nutritional deficiency and rely on their intracellular symbionts to provide nutrients, and most of endosymbiont genomes have undergone reduction. However, the study of genome reduction processes of endosymbionts has been constrained by the limited availability of genome data from different insect lineages. The obligate relationship between aphids and Buchnera aphidicola (hereafter Buchnera) makes them a classic model for studying insect-endosymbiont interaction. Here, we report 29 newly sequenced Buchnera genomes from 11 aphid subfamilies, and a comprehensive dataset based on 90 Buchnera genomes from 14 aphid subfamilies. The dataset shows a significant genomic difference of Buchnera among different aphid lineages. The dataset exhibits a more balanced distribution of Buchnera (from 14 aphid subfamilies) genome sizes, ranging from 400 kb to 600 kb, which…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —https://doi.org/10.13039/501100001809National Natural Science Foundation of China (National Science Foundation of China)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComparative Literary Analysis and Criticism · Latin American Literature Studies · Latin American and Latino Studies
Background & Summary
Insects are known to be associated with multiple symbionts for acquiring unique and beneficial functions^1^ during the 480 million years of evolutionary history^2,3^. Symbionts can help the host insect in better adapting to complex and dynamic ecological environments, influencing mating, reproduction, metabolism, and immunity of hosts^2^. These symbionts participate in many life activities of their host insects, for example, helping insects in resisting the invasion of pathogenic microorganisms and parasites, evading predators, developing resistance to insecticides, and synthesizing essential nutrients required by insects^4^. Furthermore, the composition and metabolic activities of symbionts are influenced by the selection and regulation of the host insects. Symbionts are considered as a unique “multifunctional organ” of insects and an indispensable component of insects. The study of insect symbionts has significant implications in biocontrol, interruption of vector-borne diseases, and prevention and control of insect pests^5^.
The phenomenon of genome reduction has been observed consistently across various obligate symbionts in insects, particularly within the suborder Sternorrhyncha. This symbiont is strictly dependent on the host, transmitted exclusively maternally, and has co-diversified with hosts for a significant duration, undergoing early genome loss yet often maintaining stability over time^6,7^. Examples include Carsonella ruddii in psyllids (158–166 kb)^8^, Portiera aleyrodidarum in whiteflies (281–358 kb)^9^, Tremblaya princeps (139–171 kb) and Moranella endobia (538 kb) in mealybugs^10–12^, as well as Buchnera aphidicola in aphids (with genome sizes ranging from 419 to 656 kb)^13^. The aphid-Buchnera system is a classic model to investigate insect-endosymbiont interaction. Aphids, feeding on phloem sap as their dietary source, face nutritional limitations due to the rich in simple sugars but unbalanced mixture of amino acids in sap^14,15^. Therefore, nearly all aphids rely on their specialized intracellular symbiotic bacteria, Buchnera aphidicola (Gammaproteobacteria), to provide them with essential amino acids, vitamins, and other important nutrients^16^. Buchnera is exclusively found within specialized bacteriocytes, which are symmetrical arrangements in the abdominal hemocoel of aphids^17,18^. Buchnera has been identified only in Aphididae species, and cannot survive independently outside the host aphids^19^. Therefore, expanding the dataset of Buchnera genomes is crucial for advancing our understanding of the evolutionary dynamics within these endosymbionts. The additional genomes will enable comprehensive analyses of genetic variation, adaptation, and co-evolutionary patterns, shedding light on fundamental aspects of endosymbionts and host-endosymbionts interactions.
The characteristics of clonality and maternal vertical transmission contribute to a faster fixation rate of slightly deleterious mutations in Buchnera than free-living relatives, which is reflected in the accelerated evolution of protein coding genes^20^, gene inactivation and loss^21,22^, leading to a significant reduction in genome size^23,24^. While the Buchnera have undergone genome reduction, most of them still retain the necessary genes related to the biosynthesis of essential amino acids required by aphids^22,25,26^. The genome size of Buchnera in the Lachninae is only from 422 kb to 458 kb, while in the Macrosiphini of Aphidinae ranges from 614 kb to 671 kb, reflecting the adaptive evolution between aphids and Buchnera at the genomic level^13^. Chong et al.^27^ analyzed 39 Buchnera genomes from 6 aphid subfamilies and found that the most recent common ancestor of Buchnera had at least 616 protein coding genes, which then experienced non-random gene loss in different lineages^27^.
With the continuous advancement of high-throughput sequencing technologies, the number of obligate symbiont genomes has increased. Currently, 77 Buchnera strains from 61 aphid species belonging to 10 subfamilies have been deposited in the GenBank database. However, the majority of these data are concentrated in Aphidinae, with 49 Buchnera strains from 33 species, accounting for approximately 63.6%. And the sizes of the published Buchnera genomes are mainly around 400 kb or 600 kb, which do not well represent the process of genome reduction. To thoroughly explore the diversity of Buchnera genomes across different aphid species, we sequenced 29 new genomes representing 19 aphid species of 11 subfamilies. This addition includes Buchnera genomes from four additional aphid subfamilies (Greenideinae, Mindarinae, Neophyllaphidinae and Taiwanaphidinae). To delve deeper into the reduction patterns of Buchnera genomes, a more comprehensive and reliable dataset is needed. Therefore, based on a quality control process, we constructed a robust dataset comprising 90 Buchnera genomes by combining the 29 newly sequenced genomes from 11 subfamilies and 61 selected genomes from the GenBank. The dataset we utilized demonstrates a more even distribution of Buchnera genome sizes, encompassing sizes such as 400 kb, 500 kb, and 600 kb. This diversity illustrates the ongoing process of genome reduction in Buchnera. In contrast, previous datasets predominantly featured Buchnera genome sizes concentrated around 400 kb and 600 kb. Additionally, our dataset includes 47 aphid species from a wider range of families, with primary representation from Aphidinae (600 kb) and Lachninae (400 kb). The extensive coverage of genome data can be effectively employed to validate the genome evolution of Buchnera within a phylogenetic framework, and contribute to understanding of the microevolutionary processes that shape genome reduction in insect obligate endosymbionts.
Methods
Sampling and DNA sequencing
Twenty-nine aphid samples from 11 subfamilies (Aphidinae, Calaphidinae, Chaitophorinae, Drepanosiphinae, Eriosomatinae, Greenideinae, Hormaphidinae, Lachninae, Mindarinae, Neophyllaphidinae, Phyllaphidinae, Taiwanaphidinae and Thelaxinae) were collected from May 2015 to June 2019 in various regions of China, including Fujian, Yunnan, Guangdong, Beijing, Zhejiang, Jiangxi, and Guangxi. All collected aphid specimens were identified to species level mainly based on morphological characters by experienced aphid taxonomists. Some samples from the same aphids originated from different geographical locations, spanning a considerable geographic range, evenly from south to north (Table S1). Each sample comprised of multiple individuals collected from the same aphid colony on a single host plant. Detailed sampling information is listed in the Supplemental Table S1. The specimens were kept in 95% ethanol and store at −20 °C after collection. Due to the different body size of aphid species, five to twenty apterous adult females of each sample were used for DNA extraction. After washing three times in ultrapure water, the genomic DNA extraction from the aforementioned pooled samples of each aphid species were performed using the DNeasy Blood and Tissue kit (QIAGEN), following the manufacturer’s manual.
All DNA samples were sent to Biomarker Technologies Co., Ltd. (China, Beijing) for metagenomic next-generation sequencing. The metagenomic library construction process involved fragmentation and purification of genomic DNA, followed by end repair and A-tailing. Subsequently, adapters were ligated to the fragments, and the resulting products were purified. PCR amplification was performed, and the resulting products were purified again. Finally, the library quality was assessed prior to downstream analysis. Following library construction, the library underwent Illumina (Illumina Corp., San Diego CA, USA) 2 × 150 paired-end sequencing using Illumina NovaSeq 6000 platform (Illumina Corp., San Diego CA, USA) with the NovaSeq 6000 S4 Reagent Kit (Illumina Corp., San Diego CA, USA). To ensure the quality of bioinformatics, raw reads were filtered to obtain clean reads. Trimmomatic^28^ software (parameters: LEADING: 3, TRAILING: 3, SLIDINGWINDOW: 50:20, MINLEN: 100) was used to filter raw tags and obtain high-quality sequencing data (clean tags). Finally, a minimum of 10 GB of sequence data (clean reads) was generated for each sample.
Genome assembly and annotation
Clean reads were utilized to assemble the genome of Buchnera. The assembly process began with the use of MEGAHIT v1.1.1^29^, which generated numerous long contigs in a final assembly fasta document. Subsequently, 16S rRNA sequences of Buchnera from the same species of 29 aphids available on NCBI were downloaded. These 16S rRNA sequences served as query sequences to extract high-coverage Buchnera chromosome genomes from the final assembly sequences using BLASTN v2.5.0^30^. The extracted sequences were then de novo assembled using Geneious v7.1.9^31^. In cases where the genome was not circular, the extracted sequence served as a seed sequence for subsequent assembly in NOVOPlasty v3.5^32^ (parameters: Insert size = 250, Read length = 150, Genome range = 40000–70000, K-mer = 33, 30, 27). Most of the Buchnera genomes were circularized through this process. For cases where circularization was not achieved, sequences from NOVOPlasty and MEGAHIT were combined and assembled together in Geneious. Finally, all Buchnera genomes were successfully circularized. The circular draft genome sequences were corrected by Pilon^33^ for bases correcting and mis-assemble fixing, based on the paired end data.
We additionally assembled the mitochondrial genomes (mitogenomes) of host aphids using the aforementioned methods. For this purpose, we utilized the cox1 sequences and complete mitogenome sequences of closely related species as seed sequences.The mitogenomes of Kurisakia onigurumii Mt23 & Mt24, Neophyllaphis podocarpi Mt25, and Neophyllaphis varicolor Mt26 were not circular and were represented by a long contig, respectively. However, we successfully assembled the mitogenomes of the remaining 25 host aphids into circular genomes.
Another 61 Buchnera genomes from the Genbank were selected and incorporated with our newly sequenced genomes into a comprehensive genome dataset of 90 Buchnera genomes. Among the 77 genomes accessible on NCBI (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=9, 2022), we specifically chose 61 Buchnera genomes originating from diverse aphid hosts. In particular, only one Buchnera strain per aphid species was retained, typically favoring strains that had undergone detailed genome analysis and possessed the relative large genomes. Additionally, Buchnera genome sequences with high levels of degenerate bases (more than 5%) were not considered, despite species of this kind having only one Buchnera genome. For example, Hormaphis cornu isolate DLS_fromHcor80 (accession number: CP051840.1) has a total genome length of 643,231 bp, but it contains 60,999 degenerate bases “N”, accounting for 9.48% of the sequence. Such instances could potentially mislead future analyses related to genome structure and features. The names of all Buchnera strains are presented by the species name of their corresponding host aphids (Table 1). All complete circular genome were annotated through both the RAST Server v2.0^34^ and Prokka v1.13.3^35^. All genes that differed in length or location were manually curated. The gene names were standardized based on the Prokka annotation.Table 1. Sources and genome characteristics (base composition, gene size, and gene number) of 90 Buchnera strains used in dataset construction.BuchneraHost aphidsSourceBase compositionGenome components size (bp)Gene countA (%)T (%)C (%)G (%)A + T (%)G + C (%)AT skewGC skewGenomeCDStRNArRNAtmRNANoncodingCDStRNArRNAtmRNA1Muscaphis stroyaniCP034861.137.336.912.812.974.225.70.0060.0046192385E + 05246145573687703956532312Melanaphis sacchariCP029161.137.237.512.712.674.725.3−0.005−0.0046261376E + 05255445623756506558133313Schizaphis graminumCP029205.137.237.512.812.574.725.3−0.004−0.016416896E + 05247245623736964659932314Rhopalosiphum maidisCP032759.137.237.512.812.574.725.3−0.004−0.016429296E + 05246745533766627760232315Rhopalosiphum padiCP034858.137.537.212.512.774.725.20.0040.0086439416E + 05247545583756508759432316Aphis neriiCP034885.138.137.812.112.175.924.20.0040.0016314916E + 05246545533696199158932317Aphis helianthiCP034894.138.137.81212.175.924.10.0040.0016342116E + 05247145483695998259132318Aphis aurantii 13***CP13501837.637.312.512.674.925.10.0040.0016290806E + 05246545523696408457832319Aphis aurantii 14*CP13502137.637.312.512.674.925.10.0040.0016288706E + 052463455236962784575323110Aphis craccivoraCP043999.137.937.612.212.275.524.40.0040.0036327316E + 052468454636966565579323111Aphis fabaeCP042427.13837.712.112.175.724.20.0040.0046349316E + 052468454937161212589323112Aphis urticataCP048744.137.537.212.712.774.725.40.00406309696E + 052463454837169013591323113Aphis nasturtiiCP034888.137.737.412.412.475.124.80.0030.0016303316E + 052379454837264522584313114Aphis gossypiiCP042426.137.437.212.712.774.625.40.0030.0026283246E + 052464455037263466583323115Aphis glycinesCP009253.137.337.112.812.874.425.60.003−0.0016281646E + 052458455037265088581323116Pentalonia nigronervosaCP061275.13736.812.913.273.826.10.0020.016174835E + 0524704558363116502522323117Hyadaphis tataricaeCP034873.136.736.313.513.573270.0060.0026338676E + 052466455337067308580323118Diuraphis noxiaCP013259.137.537.112.712.874.625.50.0050.0066362666E + 052461455036669455586323119Lipaphis pseudobrassicaeCP034870.137.537.112.412.574.624.90.0040.0046412215E + 052477455237192222578323120Brevicoryne brassicaeCP034882.137.737.312.512.67525.10.0060.0036458506E + 052469454936968457596323121Brachycaudus carduiCP034879.137.637.112.612.774.725.30.0070.0046439316E + 052478455136365474591323122Myzus persicaeCP002699.137.537.112.712.874.625.50.0060.0056435026E + 052479455536363744586323123Artemisaphis artemisicolaCP034900.13837.512.312.375.524.60.0060.0016334066E + 052478455035970038577323124Hyperomyzus lactucaeCP034876.137.236.71313.173.926.10.0060.0046418566E + 052487455336366886591323125Macrosiphoniella sanborniCP034864.138.137.61212.275.724.20.0060.0076219315E + 052464457136682291544323126Uroleucon sonchiCP047588.138.137.612.212.275.724.40.006−0.0016143495E + 052464456336480215539323127Uroleucon ambrosiaeCP002648.138.237.71212.175.924.10.0070.0026153805E + 052464456336875953541323128Sitobion avenaeCP034855.137.236.812.9137425.90.0060.0056361776E + 052479455236874738571323129Sitobion miscanthiCP084934.137.236.912.913.174.1260.0040.0086713556E + 052479454973681297601323230Acyrthosiphon kondoiCP002645.137.436.912.812.974.325.70.0060.0066417946E + 052469455936373265581323131Acyrthosiphon pisumBA000003.237.136.613.113.273.726.30.0060.0066406816E + 052477454936671806582323132Acyrthosiphon lactucaeCP034891.13837.512.112.375.524.40.0070.0086423356E + 052478454836280073573323133Microlophium carnosumCP048747.137.53712.712.874.525.50.0060.0056422965E + 052476455136489298569323134Macrosiphum gauraeCP034867.137.336.912.81374.225.80.0050.0086435616E + 052474455036672885581323135Macrosiphum euphorbiaeCP033006.137.53712.712.974.525.60.0070.0076453346E + 052473455036476716577323136Formosaphis micheliae 27*CP13504538.338.211.611.976.523.50.0010.0125547084E + 0525364586362106692457333137Formosaphis micheliae 28*CP13504738.338.211.611.976.523.50.0010.0115543534E + 0526314587362109421453343138Baizongia pistaciaeAE016826.137.137.512.712.774.625.4−0.0060.0016159805E + 0524744574364104082521323139Melaphis rhoisCP033004.136.937.112.812.87425.6−0.0030.0026164525E + 0524554587362103857542323140Schlechtendalia chinensisCP011299.13737.212.912.974.225.8−0.0020.0036078355E + 052462457335589734547323141Mindarus japonicus 20*CP13503038.438.311.611.776.723.30.0020.0025412025E + 052618456936369261476343142Mindarus keteleerifoliae 18*CP13502438.137.91211.97623.90.003−0.0025425585E + 052541457236674558477333143Mindarus keteleerifoliae 19*CP13502738.137.91211.97623.90.003−0.0025425115E + 052610457236674688476343144Taiwanaphis decaspermi 29*CP13504939.739.110.610.678.821.20.0080.0014538954E + 052389461136346053411313145Neophyllaphis podocarpi 25*CP13503938.73911.211.277.722.4−0.004−0.0015594765E + 052387467936258020519313146Neophyllaphis varicolor 26*CP13504238.73911.211.177.722.3−0.004−0.0035595245E + 052390467036259691518313147Anoecia oenotheraeCP033012.139.33811.81177.322.80.016−0.0365486914E + 0526374568368114458444343148Thelaxes californicaCP034852.139.138.311.411.277.422.60.011−0.0095226994E + 052385469838376042453313149Kurisakia onigurumii 23*CP13503339.938.910.610.678.821.20.01205090284E + 052467434238262346448323150Kurisakia onigurumii 24*CP13503639.938.910.610.678.821.20.01205093414E + 052467466938361962445323151Nipponaphis monzeniAP019379.139.238.511.211.277.722.40.009−0.0015877814E + 0523944599355145391445313152Ceratoglyphina bambusae 01*CP13498240.739.49.810.180.119.90.0170.0134246194E + 052250455735645627388293153Chaitoregma tattakana 04*CP13499139.438.211.111.377.622.40.0160.0094194644E + 052327463636441943383303154Astegopteryx bambusae 03*CP13498840.839.39.810.180.119.90.0190.0144119224E + 052325463636639456388303155Astegopteryx bambusae 02*CP13498540.839.39.810.180.119.90.0190.0144113724E + 052325463336638216384303156Ceratovacuna japonicaAP026065.140.939.19.910.180200.0220.0124147254E + 052323463435641805379303157Ceratovacuna keduensis 05*CP13499441.739.99.19.381.618.40.0210.0154135414E + 052327463736039326382303158Pseudoregma panicola 07*CP13500041.339.69.49.780.919.10.0210.0144128984E + 052324462935937591383303159Pseudoregma panicola 06*CP13499741.339.69.49.780.919.10.0210.0144125624E + 052324462935937876382303160Stegophylla sp.Manzano-Marín et al., 202338.438.611.511.57723−0.0030.0014129533E + 052473456536968010352323161Phyllaphis fagiManzano-Marín et al., 202338.638.711.311.477.322.7−0.0010.0024314064E + 052548455535963425370333162Shivaphis celti 15*CP13497737.538.212.212.275.724.4−0.00904224854E + 052391456335936404388313163Therioaphis trifoliiCP032996.139.84010.110.179.820.2−0.0030.0014192934E + 052489455136433760390323164Sarucallis kahawaluokalaniCP032999.137.437.812.412.475.224.8−0.0040.0024283564E + 052389456036844242390313165Takecallis taiwana 16*CP1349783737.312.812.974.325.7−0.0040.0034334534E + 052384455436443036396313166Takecallis taiwana 17*CP1349793737.312.812.974.325.7−0.0040.0034329794E + 052384455436442601395313167Greenidea ficicola 11*CP13501240.740.79.39.381.418.600.0043953444E + 052234456636329894368293168Mollitrichosiphum nigrofasciatum 12*CP1350153939.310.910.878.321.7−0.005−0.0014189554E + 052364455733550493374313169Drepanosiphum platanoidisManzano-Marín et al., 20234140.69.29.281.618.40.00404489624E + 052398463838467208379313170Periphyllus koelreuteriae 21*CP13498040.7419.29.181.718.3−0.004−0.0034520784E + 052324457239454044408303171Periphyllus koelreuteriae 22*CP13498140.6419.29.281.618.4−0.005−0.0044515924E + 052325457439756471406303172Sipha maydisCP097205.138.839.310.91178.121.9−0.0070.0034623904E + 052401458337660308411313173Periphyllus lyropictusCP097457.140.740.49.59.481.118.90.004−0.0094568904E + 052314456537760459411303174Nippolachnus piri 09*CP13500639.439.610.510.67921.1−0.0030.0084176404E + 052393457038447125377313175Nippolachnus piri 10*CP13500939.439.610.510.67921.1−0.0030.0084176604E + 052393457038447145377313176Tuberolachnus salignusLN890285.139.139.310.810.878.421.6−0.003−0.0024214264E + 052408457838847017377313177Tuberolachnus salignus 08*CP135003**39.139.310.810.878.421.6−0.003−0.0024216664E + 052408457838847263382313178Cinara confinisLT667503.137.938.111.9127623.9−0.0020.0044437474E + 052386457338768133384313179Cinara tujafilinaCP001817.138.438.511.511.576.923−0.0020.0014449254E + 052385457538478487394313180Cinara cedriCP000263.140.139.81010.179.920.10.0030.0044163804E + 052390457136653361369313181Cinara strobiLR025085.138.13811.91276.123.90.0010.0014401404E + 052400458037467761372313182Cinara piceaeLR217739.139.238.91110.978.121.90.004−0.0044350174E + 052392459537565225372313183Cinara curtihirsutaLR217700.139.539.310.610.678.821.20.00204332294E + 052390459137759409375313184Cinara curvipesLR217710.139.539.310.610.678.821.20.0020.0014338374E + 052394458737459504376313185Cinara cuneomaculataLR217695.138.338.111.811.876.423.60.003−0.0014309604E + 052476458137566837368323186Cinara kochiana kochianaLR217707.138.538.211.611.776.723.30.0040.0034337404E + 052386458537362421374313187Cinara laricifoliaeLR217717.13938.711.111.277.722.30.0030.0024367134E + 052394458637565617374313188Cinara pseudotaxifoliaeLT635893.138.137.812.11275.924.10.004−0.0034466274E + 052395457836970648378313189Cinara cf splendensLR217692.138.237.81211.97623.90.005−0.0054447974E + 052393457537070934378313190Cinara splendensLR217722.138.43811.911.776.423.60.006−0.0074452374E + 0523894577371699923773131Asterisk (*) indicates that the genome was newly sequenced in our study.
Genome characteristics
The PhyloSuite v1.2.1^36^ was used to extract the genomic feature information based on the well-annotated detailed files in GenBank format. Linear analysis was conducted to examine the relationship between genome size and GC content of the 90 Buchnera genomes. The genome size of Buchnera is from 395,344 bp (Buchnera from Greenidea ficicola 11) to 671,355 bp (Buchnera from Sitobion miscanthi), the average size is 531,263 bp (Table 1). The genome sizes of Buchnera across the Aphididae are larger than 600 kb, which representing the largest genome size observed among all Buchnera strains (Table 1). In contrast, the Buchnera strains from Hormaphidinae aphids display the smallest genome size (except Buchnera from Nipponaphis monzeni with 587,781 bp), only about 410 kb. Extreme genome reduction has been also confirmed in our newly sequenced Buchnera genomes, that of Hormaphidinae, Greenideinae, Chaitophorinae, Calaphidinae and Lachninae. And the Buchnera from Greenidea ficicola 11 (Greenideinae), which encodes only 363 proteins, is the smallest Buchnera genome by far (395,344 bp). The length and the number of different type genes in different Buchnera strains have been shown in Table 1. We can find that the number and length of tRNA, rRNA, tmRNA are relative stable in different Buchnera strains, but the genome length, CDS (coding sequences) length and count, and noncoding length were significantly different. The average percentage of noncoding regions of all Buchnera strains account for 12.5% of the total genome size. The GC content of all Buchnera is very low, from 18.3% (Buchnera from Periphyllus koelreuteriae 21, with genome size 452,078 bp) to 27% (Buchnera from Hyadaphis tataricae, with genome size 633,867 bp), and the average is 23.3% (Table 1 and Fig. 1). The number and size of different types of genes in the Buchnera genomes from all aphid subfamilies were extracted. Additionally, the size of non-coding regions was also extracted. The reduced genomes tend to have a smaller number of genes (Fig. 2). Although some Buchnera strains still have relatively large genome sizes (e.g., Buchnera from Nipponaphis monzeni and Anoecia oenotherae), their noncoding regions are larger (reaching 24.7% of the genome size) than others.Fig. 1. Relationship of genome size (kb) to GC content (%) in Buchnera genomes. Buchnera from Hyadaphis tataricae strain has the highest GC content (27%), Buchnera from Periphyllus koelreuteriae 21 has the lowest GC content (18.3%).Fig. 2. The length (indicated by asterisks) and number (indicated by circles) of various gene types in different Buchnera strains, as well as the length of the genome and non-coding regions. Different groups are represented by different colored lines. Various background colors represent Buchnera from different subfamilies of aphids, labeled with text in corresponding colors. The order of Buchnera strains from left to right corresponds to the order on the phylogenetic tree. CDS: coding sequence, mRNA: messenger RNA; tmRNA: transfer mRNA.
The COG categories classification of all Buchnera
The protein coding genes presenting in all Buchnera strains were classified into different functional categories based on previous study^25^. The two datasets of clusters of orthologous genes (COGs) of Buchnera aphidicola and Escherichia coli K-12 substr. MG1655 (https://www.ncbi.nlm.nih.gov/research/cog/) were downloaded for the COGs annotation of all Buchnera strains. All the genes were classified into different COG categories based on the two datasets. Functional categories of all Buchnera genomes was presented in Supplemental Table S2. All protein coding genes was conducted to identify orthologous gene clusters across all Buchnera strains. The clustering of 26 functional categories showed the distinct patterns of orthologous gene reduction during the evolution of Buchnera (Fig. 3 and Supplemental Fig. S1). But there was no gene grouped into the follow functional categories: B (Chromatin Structure and Dynamics), W (Extracellular Structures), Y (Nuclear Structure), and Z (Cytoskeleton). Nearly all COG categories have an ongoing reduction of orthologous gene clusters from Buchnera strains with big genome size to small genome size.Fig. 3. The gene clustering patterns of all Buchnera strains within different COG functional categories. Different colors represent different Buchnera strains within each category, with different subfamilies labeled in bold black text.
Data Records
The newly assembled genomes of 29 Buchnera strains (with strain names ending in numbers) are available through figshare^37^ as well as the National Center for Biotechnology Information (NCBI)^38–66^. The 29 mitogenomes of their host aphids, along with the comprehensive dataset based on 90 Buchnera genomes, can also be accessed through figshare^37^.
Technical Validation
Sequencing data quality control
In sequenced raw reads, low-quality sequences are present. To ensure the quality of bioinformatics analyses, the raw reads undergo filtering to obtain clean reads for subsequent bioinformatics analysis. Initially, Trimmomatic software is employed to filter raw tags and obtain high-quality sequencing data (clean tags). The clean reads data is obtained after the quality control of sequenced data.
Validation of genome assembly
We utilized NOVOPlasty v3.5 and MEGAHIT v1.1.1 for sequence assembly. The sequences from different software were aligned with other complete sequences of Buchnera. If the similarity is lower than 50%, these sequences will be excluded in the future genome assembly. The analysis suggested completeness ranging from 92.96% to 99.99%, with an average of 98.36%. The relatively lower completeness of some smaller genomes is due to the genome reduction in Buchnera, a widely recognized phenomenon.
Dataset quality control
The genome available in GenBank may not be complete or contain too many degenerate bases. Therefore, data cleaning is essential for construction of the comprehensive dataset and subsequent analyses. Initially, we conducted genome integrity assessments on all genomes downloaded from the GenBank database by CheckM2 v1.0.1^67^. The genome with low completeness were removed, such as the Buchnera genome (CP033006) from Hormaphis cornu (completeness: 64.04%, coding density: 50%, genome size: 643,250 bp). The base composition was analyzed using BioEdit v7.0.5.3^68^. Genomes with degenerate base content exceeding 1% of the total genome length were excluded. This step was crucial, as it may influence the subsequent analyses about genome length and the prediction of functional genes.
Supplementary information
Supplemental Fig. S1 Supplementary Table S1 Supplementary Table S2
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Sudakaran S Kost C Kaltenpoth M Symbiont acquisition and replacement as a source of ecological innovation Trends Microbiol.20172537539010.1016/j.tim.2017.02.01428336178 · doi ↗ · pubmed ↗
- 2Misof B Phylogenomics resolves the timing and pattern of insect evolution Science 201434676376710.1126/science.125757025378627 · doi ↗ · pubmed ↗
- 3Gupta A Nair S Dynamics of insect-microbiome interaction influence host and microbial symbiont Front. Microbiol.202011135710.3389/fmicb.2020.0135732676060 PMC 7333248 · doi ↗ · pubmed ↗
- 4Zytynska SE Tighiouart K Frago E Benefits and costs of hosting facultative symbionts in plant-sucking insects: A meta-analysis Mol. Ecol.2021302483249410.1111/mec.1589733756029 · doi ↗ · pubmed ↗
- 5Wang SB Qu S Insect symbionts and their potential application in pest and vector-borne disease control Bulletin of Chinese Academy of Sciences (Chinese Version)201732863872
- 6Mc Cutcheon JP Boyd BM Dale C The life of an insect endosymbiont from the cradle to the grave Curr. Biol.20192948549510.1016/j.cub.2019.03.03231163163 · doi ↗ · pubmed ↗
- 7Baumann P Biology of bacteriocyte-associated endosymbionts of plant sap-sucking insects Annu. Rev. Microbiol.20055915518910.1146/annurev.micro.59.030804.12104116153167 · doi ↗ · pubmed ↗
- 8Sloan DB Moran NA Genome reduction and co-evolution between the primary and secondary bacterial symbionts of psyllids Mol. Biol. Evol.2012293781379210.1093/molbev/mss 18022821013 PMC 3494270 · doi ↗ · pubmed ↗
