A reference-grade genome assembly data of sika deer in Hokkaido, Japan, Cervus nippon yesoensis
Yuki Matsumoto, Junco Nagata, Yukiko Matsuura, Hayato Iijima

TL;DR
This paper presents a high-quality genome assembly for the sika deer subspecies Cervus nippon yesoensis, which is larger and more complete than previous assemblies.
Contribution
The study provides a new, high-quality de novo genome assembly for Cervus nippon yesoensis using HiFi sequencing.
Findings
The genome assembly CerNipYes1.0 has a size of 3.1 Gb, larger than previously reported sika deer genomes.
The assembly achieved a scaffold N50 of 77 Mb and 99.75% completeness based on BUSCO analysis.
Abstract
Sika deer (Cervus nippon) is naturally distributed across East Asia and includes 14 subspecies, showing phenotypic and genetic diversity. In this study, we constructed a de novo genome assembly of wild sika deer using one of the largest subspecies, C. n. yesoensis. We used HiFi, high quality long-read based on Pacific Bioscience to assemble our novel genome assembly CerNipYes1.0. The genome size of CerNipYes1.0 is estimated to be 3.1Gb, which is 0.6Gb larger than the other genome assembly of sika deer previously reported. The number of scaffolds is 1,810 and N50 length achieved 77Mb. Compleasm, a genome completeness evaluation tool based on Benchmarking Universal Single-Copy Orthologs (BUSCO) indicated that 12,562 (99.75%) genes are completed as genes with comparing to database. Our results indicate that CerNipYes1.0 is valuable to study the molecular biology, phylogeny and evolution of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genome Rearrangement Algorithms · Identification and Quantification in Food
Specifications TableSubjectBiologySpecific subject areaBioinformatics, GenomicsType of data*Raw, Filtered, and Processed.*Data collectionGenomic DNA was extracted from a muscle sample of sika deer using the NucleoBond® AXG Column (Takara Bio, Inc.). Genomic libraries were constructed with the SMRTbell® Express Template Prep Kit 2.0 (Pacific Biosciences, California, USA), and long-read HiFi sequencing was performed on the PacBio Sequel II platform according to the manufacturer's protocol. Sequencing data were generated using SMRT Link v8.0.0 (Pacific Biosciences), and genome assembly was conducted using Hifiasm v0.19.5, followed by scaffolding with RagTag v2.1.0.Data source location
- •Institution: Research and Development Section, Anicom Specialty Medical Institute Inc.
- •City/Town/Region: 2–6–3 5F Chojamachi, Yokohamashi-Nakaku, Kanagawa 231–0033
- •Country: Japan
- •Latitude and longitude for collected samples/data: 44.27470° N, 142.97850° E. Data accessibilityRaw dataRepository name: DDBJData identification number: PRJDB35526; DRR702133Direct URL to data: https://ddbj.nig.ac.jp/search/entry/bioproject/PRJDB35526; https://ddbj.nig.ac.jp/search/entry/sra-run/DRR702133.Related research articlenone
Value of the Data
1
- •The reference-grade genome assembly of wild sika deer from Hokkaido, Japan (Cervus nippon yesoensis), constructed using PacBio HiFi sequencing, has been made publicly available.
- •Completeness of genes indicates that CerNipYes1.0. is higher quality than another sika deer genome, MHL_v1.0, and that the assembly is the second-highest quality in comparison to others, while the highest one is mCerEla1.1
- •Considering the diversity of the genome size and phenotypic traits among Cervidae species, the genome resources provided here can offer valuable information to support further studies for molecular biology, phylogeny and evolution of the Cervidae.
Background
2
Cervidae is the second largest family of the terrestrial artiodactyls, consisting of over 50 extant species [1] and representing the major part of terrestrial mammal biomass [2]. One of these species, the sika deer (Cervus nippon), is naturally distributed across East Asia and includes 14 subspecies [3]. One of the largest subspecies, C. n. yesoensis distributed in Hokkaido, Japan, has been well studied based on ecological [4] and management context [3], and genetic studies have suggested a recent population expansion in distribution after a severe bottleneck [5,6]. Whole genome sequencing has become popular to explore population dynamics, phylogeny, and/or molecular mechanism underlie phenotypic traits [7,8]. To date, the genomes of six Cervus individuals have been assembled (Table 1, [7,[9], [10], [11], [12], [13]]). The previously reported sika deer genome [7] was assembled using an individual from a deer farm in China, and its subspecies was unknown. Therefore, neither wild sika deer nor the sika deer native to Japan have been whole genome sequenced so far. In this study, we conducted genome assembly using long-read sequencing technology to investigate genome structure of wild sika deer from Hokkaido, Japan, whose origin and subspecies are known.Table 1. Comparative metadata of genome assemblies utilized in the current study.Table 1 dummy alt textSpeciesCommon nameAssemblyIsolate or specimen IDSexSequencing MethodsCore assembly methodStudyCervus nippon yesoensisSika deerCerNipYes1.0JP_HOKNOK_M01malePacBio HiFiHifiasm****This studyCervus nipponSika deerMHL_v1.0Not availablefemalePacBio; Illumina; Hi-CwtdbgXing et al. GPB 2023Cervus albirostrisWhite-lipped deerWLDWLD-bclnot collectedIllumina HiSeqplatanusLondon et al. J Hered 2022Cervus hanglu yarkandensisTarim red deerCEY_v1CEY-2017male10X Genomics; Illumina NextSeqSupernovaBa et al. Sci data 2020Cervus canadensisWapitiASM1932006v1Bull #8malePacBio RSII; Illumina HiSeqMaSuRCAMasonbrink et al. PLOS ONE 2021Cervus elaphusRed deermCerEla1.1SAN0000996femalePacBio; 10X Genomics; Hi-CHifiasmPemberton et al. Wellcome Open Res. 2020Cervus elaphus hippelaphusRed deerCerEla1.0HungarianmaleIllumina HiSeqAllPathsVozdova et al. Animals 2021Bold indicates the assembly developed in the current study.
Data Description
3
A de novo genome assembly of C. n. yesoensis was constructed using long-read HiFi sequencing based on Pacific Biosciences (PacBio) platform (Sequel II) (DDBJ accession: PRJDB35526 and DRR702133). Hifiasm using the HiFi reads yielded a de novo genome assembly, and after scaffolding we constructed CerNipYes1.0 (NCBI accession: JBTNMK000000000). We obtained 3,946,389 reads and sequenced 58,855,465,335 bp in total. The average read length was 14,913.8 bp. The estimated coverage depth was approximately 18.7x. We compared the genome assembly of six Cervidae species (Table 1). The genome size of CerNipYes1.0 is estimated to be 3.1Gb, which is 0.6Gb larger than the other genome assembly of sika deer previously reported. The number of scaffolds is 1810 and N50 length achieved 77Mb (Table 2). The estimated genome size varies among seven assemblies, suggesting existing chromosome variation in Cervidae. In order to confirm the quality of CerNipYes1.0 as the genome assembly, we used compleasm [14], which is an efficient tool for assessing the completeness of genome assemblies using the miniprot protein-to-genome aligner and conserved orthologous genes from BUSCO [15]. One of the statistics of compleasm to calculate completeness of genes (C), indicates that CerNipYes1.0. is higher quality than another sika deer genome, MHL_v1.0, and that the assembly is the second-highest quality in comparison to others, while the highest one is mCerEla1.1 (Table 3). The RepeatMasker analysis of the assembled genome, CerNipYes1.0, revealed that the total content of repetitive elements was 21.63 %, which is comparable to that of other related assemblies (ranging from 14.46 % to 25.39 %) (Table 4).Table 2. Standard sequence and contiguity metrics in each genome assembly.Table 2 dummy alt textCommon nameAssemblyTotal length (bp)NumberLongestShortestN countGapsN50N50nSika deerCerNipYes1.03,147,430,5711,810159,839,34912,80057,30057377,397,291****15Sika deerMHL_v1.02,500,646,934588143,481,7352,912145,3001,45378,786,80912White-lipped deerWLD2,692,225,130171,87423,094,54720150,310,236268,1963,769,372218Tarim red deerCEY_v12,594,114,53417,927134,243,7131,00033,508,10121,78177,688,13313WapitiASM1932006v12,526,613,007185146,388,6371,0151,906,2226,64377,654,94413Red deermCerEla1.12,886,603,524144179,953,07916,42610,4004083,473,71113Red deerCerEla1.03,438,623,60811,479181,543,3172221477,834,197435,163107,358,00613Bold indicates the assembly developed in the current study.Table 3. Completeness of genome assemblies using compleasm (12,594 genes).Table 3 dummy alt textCommon nameAssemblyC (S + D)SDFIMSika deerCerNipYes1.012,56212,119443200****12Sika deerMHL_v1.012,50012,31518525069White-lipped deerWLD12,00611,8321744060182Tarim red deerCEY_v112,50112,16933274019WapitiASM1932006v112,41512,187228550124Red deermCerEla1.112,56812,3442242105Red deerCerEla1.010,65410,605499961943Bold indicates the assembly developed in the current study.Table 4. Repeat plofile of genome assemblies using RepeatMasker.Table 4 dummy alt textCommon nameAssemblyElementnumber of elementslength occupied (bp)percentage of sequence**Sika deerCerNipYes1.0SINEs1,228,518141,895,7744.51 %Penelope86472,1580.00 %LINEs786,516375,556,93911.93 %LTR elements260,93089,518,6962.84 %DNA elements450,55373,357,2702.33 %Unclassified3,034463,5740.01 %Total interspersed repeats680,864,41121.63 %**Sika deerMHL_v1.0SINEs1,175,543135,749,3565.43 %Penelope84571,2590.00 %LINEs747,953354,732,71314.19 %LTR elements245,25285,000,3803.40 %DNA elements295,56557,817,4472.31 %Unclassified2,876450,9550.02 %Total interspersed repeats633,822,11025.35 %White-lipped deerWLDSINEs1,218,414140,153,6435.21 %Penelope81869,7060.00 %LINEs795,785358,868,37213.33 %LTR elements256,78187,843,3323.26 %DNA elements305,49059,408,0162.21 %Unclassified2,965460,6430.02 %Total interspersed repeats646,803,71224.02 %Tarim red deerCEY_v1SINEs1,198,926138,350,7085.33 %Penelope84371,7730.00 %LINEs771,045365,514,80214.09 %LTR elements250,98787,150,8633.36 %DNA elements301,52959,076,7652.28 %Unclassified2,935457,5270.02 %Total interspersed repeats650,622,43825.08 %WapitiASM1932006v1SINEs1,182,147136,508,4255.40 %Penelope83169,8200.00 %LINEs757,197360,608,65814.27 %LTR elements246,98185,656,2703.39 %DNA elements296,63258,111,0782.30 %Unclassified2,886451,4110.02 %Total interspersed repeats641,405,66225.39 %Red deermCerEla1.1SINEs1,199,359138,496,2394.80 %Penelope84370,5960.00 %LINEs769,666372,508,59012.90 %LTR elements253,40587,350,5953.03 %DNA elements414,31268,936,4412.39 %Unclassified2,939458,1060.02 %Total interspersed repeats667,820,56723.14 %Red deerCerEla1.0SINEs993,366114,508,7773.33 %Penelope65956,1460.00 %LINEs643,677259,633,2947.55 %LTR elements219,38272,834,8812.12 %DNA elements259,58449,902,3781.45 %Unclassified2,598404,9510.01 %Total interspersed repeats497,340,42714.46 %Bold indicates the assembly developed in the current study.
Experimental Design, Materials and Methods
4
Sample collection
4.1
We obtained a male sika deer (JP_HOKNOK_M01) captured in Nishiokoppe Village, where there is a special hunting area authorized by the Hokkaido governor, Hokkaido, Japan. The adult male deer was culled legally in 2022 to prevent agricultural damage. The collection site was located at 44.27470° N, 142.97850° E.
DNA extraction and sequencing
4.2
A muscle sample from the sika deer was used to extract DNA using NucleoBond® AXG Column (Takara Bio, Inc.). The high molecular genomic DNA was physically fragmented to approximately 20 kb using Megaruptor 3. After blunting both ends of the fragmented DNA, 3′-dA protruding ends were treated, and SMRTbell adapters were ligated to create a sequencing library that serves as a sequencing template using SMRTbell Express Template Prep Kit 2.0 and Barcoded Overhang Adapter Kit (Pacific Biosciences Inc.). The entire volume of the resulting sequencing library was mixed, the size distribution was confirmed by electrophoresis, and size selection was performed using SageELF (Sage Science Inc.).
Sequencing primers complementary to both ends of the SMRTbell™ adapters and DNA polymerase were annealed to the sequencing library to form sequencing templates. These templates were loaded onto a SMRT® Cell, where they were immobilized within zero-mode waveguides (ZMWs), and sequencing reactions were performed using PacBio Sequel II System.
Genome assembly
4.3
A quality control was performed using fastp v.0.22.0 with quality score exceeding 30 [16]. The reads were filtered and merged into one fastq file. Then we used Hifiasm v 0.19.5 [17] with default settings for de novo assembly using the fastq file. After generating the de novo assembly, we used RagTag v2.1.0 [18] to scaffold the mCerEla1.1 genome, which is the highest genome assembly in 6 deer species compared in this study. The six deposited genome assemblies were downloaded from NCBI Genbank (https://www.ncbi.nlm.nih.gov/datasets/genome). We compared genome assembly statistics, such as total length (genome size) and number of scaffolds/contigs, using assembly-stats v 1.0.1[19]. An assessing tool to test the completeness of genome assembly, compleasm v0.2.7, a genome completeness evaluation tool based on Benchmarking Universal Single-Copy Orthologs (BUSCO), is used for checking the quality based on orthologous genes conserved across a wide range of taxa. The software runs with artiodactyla_obd12 datasets with default settings (12,594 genes in total). Repetitive element annotations were assessed using RepeatMasker version 4.2.2 with rmblastn version 2.14.1+ and FamDB (CONS-Dfam_3.9; Cervus elaphus).
Limitations
The total number of scaffolds is 1,810, although the chromosome number of the species is 2n=68 [20]. This indicated our data has not completed construction at the chromosome-scale. No chromosome and gene annotation were performed. Future study will require complete sequencing and annotating the mitochondrial genome and Y chromosomes, in addition to the gene annotation.
Ethics Statement
The work meets the ethical requirements for publication in Data in Brief.
CRediT authorship contribution statement
Yuki Matsumoto: Conceptualization, Project administration, Methodology, Writing – original draft, Writing – review & editing, Data curation. Junco Nagata: Supervision, Methodology, Writing – review & editing. Yukiko Matsuura: Methodology, Writing – review & editing. Hayato Iijima: Supervision, Methodology, Writing – review & editing, Project administration, Funding acquisition.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Geist V.Deer Of The World: Their Evolution, Behavior And Ecology 1998 Stackpole Books
- 2Greenspoon L.Krieger E.Sender R.Rosenberg Y.Bar-On Y.M.Moran U.Antman T.Meiri S.Roll U.Noor E.Milo R.The global biomass of wild mammals Proc. Natl. Acad. Sci. u S. a 1202023 e 220489212010.1073/pnas.2204892120 PMC 1001385136848563 · doi ↗ · pubmed ↗
- 3Kaji K.Uno H.Iijima H.Sika Deer: Life History Plasticity And Managementeds.1st ed.2023 Springer Singapore, Singapore
- 4Iijima H.Ueno M.Spatial heterogeneity in the carrying capacity of sika deer in Japan J. Mammal.9720167347432969247010.1093/jmammal/gyw 001PMC 5909809 · doi ↗ · pubmed ↗
- 5Nabata D.Masuda R.Takahashi O.Nagata J.Bottleneck effects on the sika deer Cervus nippon population in Hokkaido, revealed by ancient DNA analysis Zool. Sci.21200447348110.2108/zsj.21.47315118235 · doi ↗ · pubmed ↗
- 6Iijima H.Nagata J.Izuno A.Uchiyama K.Akashi N.Fujiki D.Kuriyama T.Current sika deer effective population size is near to reaching its historically highest level in the Japanese archipelago by release from hunting rather than climate change and top predator extinction Holocene 332023718727
- 7Xing X.Ai C.Wang T.Li Y.Liu H.Hu P.Wang G.Liu H.Wang H.Zhang R.Zheng J.Wang X.Wang L.Chang Y.Qian Q.Yu J.Tang L.Wu S.Shao X.Li A.Cui P.Zhan W.Zhao S.Wu Z.Shao X.Dong Y.Rong M.Tan Y.Cui X.Chang S.Song X.Yang T.Sun L.Ju Y.Zhao P.Fan H.Liu Y.Wang X.Yang W.Yang M.Wei T.Song S.Xu J.Yue Z.Liang Q.Li C.Ruan J.Yang F.The first high-quality reference genome of sika deer provides insights into high-tannin adaptation Genom. Proteom. Bioinform.21202320321510.1016/j.gpb.2022.05.008PMC 1037290435718271 · doi ↗ · pubmed ↗
- 8Ababaikeri B.Abduriyim S.Tohetahong Y.Mamat T.Ahmat A.Halik M.Whole-genome sequencing of Tarim red deer (Cervus elaphus yarkandensis) reveals demographic history and adaptations to an arid-desert environment Front. Zool.172020313307216510.1186/s 12983-020-00379-5PMC 7565370 · doi ↗ · pubmed ↗
