The genome sequence of the bramble shoot moth, Notocelia uddmanniana (Linnaeus, 1758)
Douglas Boyes, Peter W.H. Holland, Hidemasa Bono, Violaine Llaurens, Jeffrey Marcus

TL;DR
This paper presents the genome sequence of the bramble shoot moth, assembled into 28 chromosomes including the Z sex chromosome.
Contribution
The study provides a high-quality, chromosome-level genome assembly for the bramble shoot moth.
Findings
The genome assembly spans 794 megabases.
99.96% of the assembly is organized into 28 chromosomal pseudomolecules.
The Z sex chromosome was successfully assembled.
Abstract
We present a genome assembly from an individual male Notocelia uddmanniana (the bramble shoot moth; Arthropoda; Insecta; Lepidoptera; Tortricidae). The genome sequence is 794 megabases in span. The majority of the assembly, 99.96%, is scaffolded into 28 chromosomal pseudomolecules, with the Z sex chromosome assembled.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5|
| |
|---|---|
| Assembly identifier | ilNotUddm1 |
| Species |
|
| Specimen | ilNotUddm1 |
| NCBI taxonomy ID | NCBI:txid1594315 |
| BioProject | PRJEB42137 |
| BioSample ID | SAMEA7519916 |
| Isolate information | Male, whole organism |
|
| |
| PacificBiosciences SEQUEL II | ERR6590584 |
| 10X Genomics Illumina | ERR6002710-ERR6002713 |
| Hi-C Illumina | ERR6002707-ERR6002709 |
|
| |
| Assembly accession | GCA_905163555.1 |
|
| GCA_905163575.1 |
| Span (Mb) | 794 |
| Number of contigs | 238 |
| Contig N50 length (Mb) | 7 |
| Number of scaffolds | 49 |
| Scaffold N50 length (Mb) | 29 |
| Longest scaffold (Mb) | 51 |
| BUSCO
| C:98.3%[S:97.6%,D:0.7%],
|
| INSDC
| Chromosome | Size (Mb) | GC% |
|---|---|---|---|
| 1 | 51.12 | 38.3 | |
| 2 | 44.98 | 38.3 | |
| 3 | 34.90 | 38.6 | |
| 4 | 33.91 | 38.6 | |
| 5 | 33.55 | 38.5 | |
| 6 | 31.63 | 38.3 | |
| 7 | 31.24 | 38.4 | |
| 8 | 30.81 | 38.7 | |
| 9 | 29.12 | 38.8 | |
| 10 | 28.99 | 38.6 | |
| 11 | 28.96 | 38.8 | |
| 12 | 28.81 | 38.6 | |
| 13 | 27.46 | 38.5 | |
| 14 | 25.91 | 38.7 | |
| 15 | 25.88 | 38.7 | |
| 16 | 25.71 | 38.5 | |
| 17 | 25.63 | 38.7 | |
| 18 | 24.47 | 39 | |
| 19 | 21.77 | 38.9 | |
| 20 | 19.36 | 38.8 | |
| 21 | 18.18 | 39.2 | |
| 22 | 17.67 | 38.9 | |
| 23 | 17.30 | 39 | |
| 24 | 15.87 | 38.9 | |
| 25 | 16.51 | 39.8 | |
| 26 | 15.07 | 39.4 | |
| 27 | 12.65 | 39.3 | |
| Z | 75.62 | 38.4 | |
| MT | 0.02 | 18.8 | |
| - | Unplaced | 1.01 | 40.8 |
| Software tool | Version | Source |
|---|---|---|
| Hifiasm | 0.12 |
|
| purge_dups | 1.2.3 |
|
| SALSA2 | 2.2 |
|
| longranger align | 2.2.2 |
|
| freebayes | 1.3.1-17-gaa2ace8 |
|
| MitoHiFi | 1 |
|
| gEVAL | N/A |
|
| HiGlass | 1.11.6 |
|
| PretextView | 0.1.x |
|
| BlobToolKit | 2.6.2 |
|
- —Wellcome Trust
- —Wellcome Trust
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLepidoptera: Biology and Taxonomy · Genomics and Phylogenetic Studies · Insect Resistance and Genetics
Species taxonomy
Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Lepidoptera; Glossata; Ditrysia; Tortricoidea; Tortricidae; Olethreutinae; Eucosmini; Notocelia; Notocelia uddmanniana (Linnaeus, 1758) (NCBI:txid1594315).
Background
Notocelia uddmanniana (bramble shoot moth) is widely distributed across Western Europe and North Africa, with records further east from Kazakhstan to China. The larvae feed on brambles ( Rubus sp.), occurring commonly where these species exist, and occasionally cause damage to cultivated varieties ( Gordon et al., 1997). Eggs are laid singly on the foodplant, where larvae feed within a folded leaf and later within the tips of growing shoots; larvae overwinter in a silken web on the foodplant stem before recommencing feeding in spring ( Dicker, 1939). Notocelia uddmanniana also occupies woodland, and is distributed widely throughout the UK, occurring more commonly in the south. The genome of N. uddmanniana was sequenced as part of the Darwin Tree of Life Project, a collaborative effort to sequence all of the named eukaryotic species in the Atlantic Archipelago of Britain and Ireland. Here we present a chromosomally complete genome sequence for N. uddmanniana, based on one male specimen from Wytham Woods, Oxfordshire, UK.
Genome sequence report
The genome was sequenced from a single male N. uddmanniana ( Figure 1) collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338). A total of 18-fold coverage in Pacific Biosciences single-molecule long reads (N50 16 kb) and 49-fold coverage in 10X Genomics read clouds were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 165 missing/misjoins and removed 72 haplotypic duplications, reducing the assembly length by 2.16% and the scaffold number by 64.71%, and increasing the scaffold N50 by 9.53%.
Image of the ilNotUddm1 specimen taken during preservation and processing.
The final assembly has a total length of 794 Mb in 49 sequence scaffolds with a scaffold N50 of 29 Mb ( Table 1). Of the assembly sequence, 99.96% was assigned to 28 chromosomal-level scaffolds, representing 27 autosomes (numbered by sequence length), and the Z sex chromosome ( Figure 2– Figure 5; Table 2). The assembly has a BUSCO ( Simão et al., 2015) completeness of 98.9% using the lepidoptera_odb10 reference set. While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.
Table 1.: Genome data for Notocelia uddmanniana, ilNotUddm1.1.
Genome assembly of Notocelia uddmanniana, ilNotUddm1.1: metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 794,123,667 bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (75,621,453 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 chromosome lengths (28,990,537 and 17,668,102 bp), respectively. The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilNotUddm1.1/dataset/CAJHZS01/snail.
Genome assembly of Notocelia uddmanniana, ilNotUddm1.1: GC coverage.BlobToolKit GC-coverage plot. Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilNotUddm1.1/dataset/CAJHZS01/blob.
Genome assembly of Notocelia uddmanniana, ilNotUddm1.1: cumulative sequence.BlobToolKit cumulative sequence plot. The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ilNotUddm1.1/dataset/CAJHZS01/cumulative.
Genome assembly of Notocelia uddmanniana, ilNotUddm1.1: Hi-C contact map.Hi-C contact map of the ilNotUddm1.1 assembly, visualised in HiGlass. Chromosomes are given in order of size from left to right and top to bottom.
Table 2.: Chromosomal pseudomolecules in the genome assembly of Notocelia uddmanniana, ilNotUddm1.1.
Methods
Sample acquisition, DNA extraction and sequencing
A single male M. uddmanniana (ilNotUddm1) was collected from Wytham Woods, Oxfordshire, UK (latitude 51.772, longitude -1.338) by Douglas Boyes, UKCEH, using a light trap. The specimen was identified by the same individual and preserved on dry ice.
DNA was extracted from whole organism tissue at the Wellcome Sanger Institute (WSI) Scientific Operations core from the whole organism using the Qiagen MagAttract HMW DNA kit, according to the manufacturer’s instructions. Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud sequencing libraries were constructed according to the manufacturers’ instructions. Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL II and Illumina HiSeq X instruments. Hi-C data were generated from remaining whole organism tissue using the Arima v1.0 kit and sequenced on HiSeq X.
Genome assembly
Assembly was carried out with Hifiasm ( Cheng et al., 2021); haplotypic duplication was identified and removed with purge_dups ( Guan et al., 2020), without the -e flag. One round of polishing was performed by aligning 10X Genomics read data to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012). The assembly was then scaffolded with Hi-C data ( Rao et al., 2014) using SALSA2 ( Ghurye et al., 2019). The assembly was checked for contamination and corrected using the gEVAL system ( Chow et al., 2016) as described previously ( Howe et al., 2021). Manual curation ( Howe et al., 2021) was performed using gEVAL, HiGlass ( Kerpedjiev et al., 2018) and Pretext. The mitochondrial genome was assembled using MitoHiFi ( Uliano-Silva et al., 2021) and annotated using MitoFinder ( Allio et al., 2020). The genome was analysed and BUSCO scores generated within the BlobToolKit environment ( Challis et al., 2020). Table 3 contains a list of all software tool versions used, where appropriate.
Ethics/compliance issues
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission of materials by a Darwin Tree of Life Partner is subject to the Darwin Tree of Life Project Sampling Code of Practice. By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project. Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.
Data availability
European Nucleotide Archive: Notocelia uddmanniana (bramble shoot). Accession number PRJEB42137: https://www.ebi.ac.uk/ena/browser/view/PRJEB42037.
The genome sequence is released openly for reuse. The N. uddmanniana genome sequencing initiative is part of the Darwin Tree of Life (DToL) project. All raw sequence data and the assembly have been deposited in INSDC databases. The genome will be annotated and presented through the Ensembl pipeline at the European Bioinformatics Institute. Raw data and assembly accession identifiers are reported in Table 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Allio R Schomaker-Bastos A Romiguier J : Mito Finder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol Ecol Resour. 2020;20(4):892–905. 10.1111/1755-0998.13160 32243090 PMC 7497042 · doi ↗ · pubmed ↗
- 2Challis R Richards E Rajan J : Blob Tool Kit - Interactive Quality Assessment of Genome Assemblies. G 3 (Bethesda). 2020;10(4):1361–74. 10.1534/g 3.119.400908 32071071 PMC 7144090 · doi ↗ · pubmed ↗
- 3Cheng H Concepcion GT Feng X : Haplotype-Resolved de Novo Assembly Using Phased Assembly Graphs with Hifiasm. Nat Methods. 2021;18(2):170–75. 10.1038/s 41592-020-01056-5 33526886 PMC 7961889 · doi ↗ · pubmed ↗
- 4Chow W Brugger K Caccamo M : g EVAL — a Web-Based Browser for Evaluating Genome Assemblies. Bioinformatics. 2016;32(16):2508–10. 10.1093/bioinformatics/btw 159 27153597 PMC 4978925 · doi ↗ · pubmed ↗
- 5Dicker GHL : The Morphology and Biology of the Bramble Shoot-Webber, Notocelia Uddmanniana L. (Tortricidae). Ann Appl Biol. 1939;26(4):710–38. 10.1111/j.1744-7348.1939.tb 06996.x · doi ↗
- 6Garrison E Marth G : Haplotype-Based Variant Detection from Short-Read Sequencing. ar Xiv: 1207.3907.2012. Reference Source
- 7Ghurye J Rhie A Walenz BP : Integrating Hi-C Links with Assembly Graphs for Chromosome-Scale Assembly. P Lo S Comput Biol. 2019;15(8):e 1007273. 10.1371/journal.pcbi.1007273 31433799 PMC 6719893 · doi ↗ · pubmed ↗
- 8Gordon SC Woodford JAT Birch ANE : Arthropod Pests of Rubus in Europe: Pest Status, Current and Future Control Strategies. J Hortic Sci. 1997;72(6):831–62. 10.1080/14620316.1997.11515577 · doi ↗
