A Nanopore-Only Assembly of a Nuclear and Mitochondrial Genome of a Red Coachwhip (Masticophis flagellum piceus)

Alan F. Scott; David W. Mohr

PMC · DOI:10.3390/genes17030307·March 4, 2026

A Nanopore-Only Assembly of a Nuclear and Mitochondrial Genome of a Red Coachwhip (Masticophis flagellum piceus)

Alan F. Scott, David W. Mohr

PDF

Open Access

TL;DR

Researchers assembled the genome of a red coachwhip snake using portable sequencing technology, showing how mobile labs can support biodiversity studies.

Contribution

A fully nanopore-based, chromosome-level genome assembly of a red coachwhip snake, demonstrating decentralized genomics.

Findings

01

A 1.61 Gb nuclear genome with 8 macrochromosomes and 10 microchromosomes was assembled with 97.7% completeness.

02

The mitochondrial genome was assembled and annotated with 13 coding genes, 22 tRNAs, and 2 rRNAs.

03

Genomic sequencing and analysis were conducted in a simulated mobile lab using portable equipment.

Abstract

We report a chromosome-level assembly of a male red coachwhip snake (Masticophis flagellum piceus) generated exclusively with nanopore sequencing. Using Hifiasm-ONT for assembly and RagTag for scaffold polishing, we produced a 1.61 Gb nuclear genome comprising 8 macrochromosomes and 10 microchromosomes with a 97.7% BUSCO completeness score. Annotation with LiftOn found 19,832 loci, including 18,025 protein-coding genes. The mitochondrial genome, assembled with MitoHiFi and annotated with MitoFinder, was 17,119 bp with 13 coding genes, 22 tRNAs and 2 rRNAs. All sequencing was performed in a simulated mobile laboratory using a portable sequencer and a laptop with analyses done both locally and remotely. These results highlight the feasibility of decentralized genomics and its potential to accelerate biodiversity research globally.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Masticophis flagellum piceus

Figures3

Click any figure to enlarge with its caption.

Keywords

nanopore sequencingsnakegenome assemblymitochondrial genomefield-based simulation

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Nanopore and Nanochannel Transport Studies · Environmental DNA in Biodiversity Studies

Full text

1. Introduction

We are in an era of high-quality genomes because of many individual efforts and those of large consortia such as the Earth Genome Project (https://www.earthbiogenome.org/), the Vertebrate Genome Project (https://vertebrategenomesproject.org/) and the Darwin Tree of Life (https://www.darwintreeoflife.org/). This progress has been made possible largely by advances in long-read sequencing and improved software for DNA assembly and performed in large facilities with substantial equipment and informatics investments. But the overall effort is slowed by the difficulty in obtaining quality samples in remote locations, transporting these to sequencing centers without degradation and producing the high molecular weight DNA (e.g., [1]) needed for long reads. Ethical and legal concerns also complicate sample collection and material transfer between countries. These limitations have restricted the growth of genomics in under-resourced communities. An ideal solution might be one where samples can be collected, processed, sequenced and assembled locally. In this study, we demonstrate that a portable sequencer, a relatively modest investment in equipment, and the availability of new software and communication tools can produce high-quality genome assemblies in a field-simulated setting.

2. Materials and Methods

2.1. Equipment and Supplies

We used a 6 M class B recreational vehicle (2023 Coachmen Nova C, Forest River Inc., Middlebury, IN, USA) equipped with refrigeration, microwave, Starlink (SpaceX Starlink, Hawthorne, CA, USA) satellite connectivity, Li battery, solar panel and 30 amp shore electrical availability as a field laboratory. Samples were processed when connected to grid power. A Bento Lab (bento.bio, London, UK) with gel box, thermal cycler and centrifuge for 1.5 or 2 mL tubes was used for DNA isolation, and library processing. DNA was isolated with either the New England Biolabs (NEB, Ipswich, MA, USA) Monarch High Molecular Weight DNA extraction kit for tissue (T3060) or blood (T3010) kits. Additional equipment included a small programmable dry temperature block (JOANLAB, Huzhou, China), a fluorimeter (Qubit, Life Technolgoies Holding, Singapore), various pipettors (Rainin Instruments; Oakland, CA, USA), an Oxford Nanopore Technologies (ONT, Oxford, UK) P2 solo sequencer, ONT LSK-114 ligation library kits, PromethIon R10.4.1 flow cells, a 2023 MacBook Pro M3 with 8 Tb of internal memory (Apple, Cupertino, CA, USA) and several 4 Tb SSDs (Samsung, Suwon, Republic of Korea) for data collection and backup.

2.2. Sample Source and Sequencing

The specimen was collected as roadkill at the approximate location of 36.435145, −115.371807 in southern Nevada and is exempt from permit requirements. It was estimated to be dead for less than 2 h when a 15 cm tail sample was taken for DNA isolation. A first isolation from a portion of the tail was done in the field, and the remaining tissue was frozen. Subsequent isolations were prepared from the remaining tissue in Baltimore as needed to achieve the required read depth. The ONT ligation kit LSK114 was used for library preparation, and the library was sequenced on PromethIon flow cells (R10.4.1) run on the P2 solo sequencer using an M3 MacBook Pro running MinKNOW (v24.11) to collect both the nucleotide and 5 mC/5 hmC methylation data. Run times were typically 72 h with washing and reloading at 24 h intervals, varying based on the performance of the particular library and flow cell, with 1–2 μg of DNA used per library without sizing or shearing. Four DNA libraries were made and run with the goal of acquiring at least 50× read depth as recommended for Hifiasm assembly. To simplify library preparation, several reagents were pipetted in the home laboratory into 0.2 mL tubes so that the sample could be transferred serially and quickly during the protocol. Pre-aliquoting reagents also allowed quick replacement of a tube in the event one was dropped or lost. Rather than the ONT-recommended 1.5 mL tubes, 0.2 mL tubes were used throughout the library prep because they were compatible with the Bento lab thermal block.

2.3. Assembly, Annotation and Analysis

Sequence was basecalled in Dorado (github.com/nanoporetech/dorado; version 1.1.1) as either high accuracy calling (HAC) or super-accurate calling (SUP). As HAC calling is faster, it was used locally, with 5 mC and 5 hmC enabled, but POD5 files were retained and were reprocessed subsequently on our cluster as SUP to produce unaligned BAMs (to retain methylation status). To maximize read quality, we used SUP calls for the de novo Hifiasm assembly and subsequent analyses. From four flow cells, we obtained a final combined read depth of 67× and an N50 of 18.3 kb.

The genome assembly was performed with Hifiasm-ONT (Hifiasm-0.25.0-r726; [2]), and a small number of remaining scaffolds were joined using RagTag (version 2.1.0; [3]) with the congeneric species Masticophis lateralis (NCBI GCA_030761175.1) used as the reference. Chromosomes were ordered by scaffold length with seqkit (2.10.1; [4]) relative to Elaphe schrenckii (Amur rat snake; NCBI GCA_050231175.1), a recent PacBio-generated chromosome assembly. The D-Genies (v.1.5.0; [5]) dot plot tool was used to visualize the chromosomal synteny. Annotation was performed using LiftOn [6] with the well annotated Thamnophus sirtalis genome (Western garter snake; NCBI GCA_009769535.1) as the source. BUSCO analysis (ver. 5.8.2; squamata_odb12; [7]) was used to assess completeness. A Snail plot [8] was generated to summarize assembly quality.

Mitochondrial sequences were identified with MitoFinder [9], assembled with MitoHiFi [10] as implemented on Galaxy version 3.2.3 [11] using Elaphe bimaculata (NC_024743.1; [12]) as the reference and annotation source. All thirteen expected protein-coding genes were found. Multiple reference sequences were chosen for use with different analysis tools, as no single closely related species had chromosome-level and mitochondrial assemblies with detailed annotations for both. More recent submissions that used long-read methods were prioritized.

3. Results

The field-isolated DNA provided the longest reads but lacked sufficient depth for analysis. Subsequent DNA isolations were performed, and when combined, had an N50 of 18.3 kb and a read depth of 67×. The Hifiasm-ONT assembly initially produced 152 scaffolds, which were reduced to 120 following RagTag joining using the M. lateralis reference. Of these, 18 were assigned to chromosomes (Table 1), and 102 were unplaced. The unplaced scaffolds totaled 3.116 Mb and averaged 30.55 kb, ranging from 694,998 to 1449 bp, representing 0.193 percent of all reads. Colubrid snakes have been reported to have 8 macrochromosomes and 10 microchromosomes [13], which agrees with our observation. The resulting nuclear genome was 1.61 Gb and had a BUSCO completeness score of 97.7% (S:11,015; D:19; F:148; M:112) from 11,294 BUSCO groups searched using the squamata_odb12 lineage dataset (Figure 1). The BUSCO score is comparable to that of the Candoia aspera (viper boa; GCF_035149785.1) and better than most current snake genomes. LiftOn [6] identified 19,832 total loci with 18,025 classified as protein coding. No protein-coding genes were identified in the unplaced scaffolds.

By convention, the RagTag assembly of the 18 chromosomes was named MFP_1-18 based on their length (Table 1). MFP_4 was identified as the sex determining Z chromosome based on the presence of the CTNNB1 gene. Heterozygosity of the gene indicated that the specimen was male [13]. Scaffold statistics and BUSCO scores are shown in Figure 1.

Figure 2 shows the D-Genies alignment of the M. flagellum piceus nuclear genome to E. schrenckii, chosen for comparison based on the quality of its assembly. The corresponding chromosomes (right and top axes) from each species show high concordance based on visual inspection. It is not possible to determine if small mismatches are the result of errors in either genome or true species differences.

We annotated the genome using tools that perform sequence comparisons, namely LiftOn [6], primarily for protein-coding genes and Earl Grey [14] for transposable element families. These tools are not replacements for the more thorough annotation processes used at NCBI or EBI, which include RNA sequence alignment, but as more genomes from related species become available, they are a reasonable proxy until a more formal annotation can be completed. Table 2 shows the major features identified by LiftOn, and Table 3 lists the transposable elements identified with Earl Grey.

Mitochondrial Genome

The mitochondrial genome produced with MitoHiFi was 17.12 kb and was predicted to include 13 protein-coding genes, 22 tRNAs and two rRNAs (Figure 3), as expected from other colubrid snakes. We found that the mitochondrial genome was 89.15% similar to Coluber constrictor (NC_ 071936.1) and 82.98% similar to Elaphe bimaculata using the NCBI BLAST tool (ver 2.17.0; https://blast.ncbi.nlm.nih.gov/).

4. Discussion

The nuclear and mitochondrial genomes were generated solely with nanopore sequencing on a portable sequencer and a laptop computer in field-simulated conditions. Subsequent analyses were done either using Galaxy [11] tools or at an institutional datacenter. The principal bioinformatic advances were the use of Hifiasm-ONT (ver 0.25.0-r726) to error correct and assemble nanopore reads and RagTag to combine the 32 remaining scaffolds, 3 of which were added to the Hifiasm-generated chromosomes. The quality of the genome alignment compared with that of the other snakes, the high BUSCO scores for the nuclear genome and the identification of the expected number of mitochondrial genes show that genomes can be assembled at relatively low cost and without the need for a large laboratory. We noted that the longest reads were obtained from the field-isolated sample rather than subsequent preparations from the frozen tissue. This is consistent with other studies (e.g, [1]) and with other projects we have done, where freshly prepared HMW DNA produced the longest reads. The primary current limitations with field-based sequencing are the speed of the ONT Dorado base calling and the rate of transferring data from the field to a datacenter or internet resources for subsequent analyses. New GPU devices and improved software are expected to do speed basecalling, satellite communication transfer speeds should continue to increase, and software tools that can run on higher-end laptops or online, such as Galaxy [11], will also continue to be refined. A vision of field-based genomics, especially in remote locations, that can be widely implemented globally and democratize genomics [15] is near realization.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Dahn H.A. Mountcastle J. Balacco J. Winkler S. Bista I. Schmitt A.D. Pettersson O.V. Formenti G. Oliver K. Smith M. Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing Gigascience 202211 giac 06810.1093/gigascience/giac 06835946988 PMC 9364683 · doi ↗ · pubmed ↗
2Cheng H. Qu H. Mc Kenzie S. Lawrence K.R. Windsor R. Vella M. Park P.J. Li H. Efficient near telomere-to-telomere assembly of Nanopore Simplex readsbio Rxiv 202510.1038/s 41586-026-10105-641639459 PMC 13070018 · doi ↗ · pubmed ↗
3Alonge M. Soyk S. Ramakrishnan S. Wang X. Goodwin S. Sedlazeck F.J. Lippman Z.B. Schatz M.C. Ra GOO: Fast and accurate reference-guided scaffolding of draft genomes Genome Biol.20192022410.1186/s 13059-019-1829-631661016 PMC 6816165 · doi ↗ · pubmed ↗
4Shen W. Sipos B. Zhao L. Seq Kit 2: A Swiss army knife for sequence and alignment processing Imeta 20243 e 19110.1002/imt 2.19138898985 PMC 11183193 · doi ↗ · pubmed ↗
5Cabanettes F. Klopp C. D-GENIES: Dot plot large genomes in an interactive, efficient and simple way Peer J 20186 e 495810.7717/peerj.495829888139 PMC 5991294 · doi ↗ · pubmed ↗
6Chao K.-H. Heinz J.M. Hoh C. Mao A. Shumate A. Pertea M. Salzberg S.L. Combining DNA and protein alignments to improve genome annotation with Lift On Genome Res.20253531132510.1101/gr.279620.12439730188 PMC 11874971 · doi ↗ · pubmed ↗
7Simão F.A. Waterhouse R.M. Ioannidis P. Kriventseva E.V. Zdobnov E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs Bioinformatics 2015313210321210.1093/bioinformatics/btv 35126059717 · doi ↗ · pubmed ↗
8Challis R.J. Blaxter M.L. Snail plots are badges of genome assembly qualitybio Rxiv 202510.1101/2025.11.20.689594 · doi ↗