Progress in Flax Genome Assembly from Nanopore Sequencing Data

Elena N. Pushkova; Alexander A. Arkhipov; Nadezhda L. Bolsheva; Tatiana A. Rozhmina; Alexander A. Zhuchenko; Elena V. Borkhert; Nikolai M. Barsukov; Gavriil A. Oleshnya; Alina V. Milovanova; Olesya D. Moskalenko; Fedor D. Kostromskoy; Elizaveta A. Ivankina; Ekaterina M. Dvorianinova; Daiana A. Krupskaya; Nataliya V. Melnikova; Alexey A. Dmitriev

PMC · DOI:10.3390/plants15010151·January 4, 2026

Progress in Flax Genome Assembly from Nanopore Sequencing Data

Elena N. Pushkova, Alexander A. Arkhipov, Nadezhda L. Bolsheva, Tatiana A. Rozhmina, Alexander A. Zhuchenko, Elena V. Borkhert, Nikolai M. Barsukov, Gavriil A. Oleshnya, Alina V. Milovanova, Olesya D. Moskalenko, Fedor D. Kostromskoy, Elizaveta A. Ivankina

PDF

Open Access

TL;DR

This paper reports high-quality, near-complete genome assemblies for two flax varieties using Nanopore sequencing and Hifiasm, improving on existing flax genome references.

Contribution

The study presents two telomere-to-telomere quality flax genome assemblies using ONT simplex R10.4.1 data and Hifiasm, without relying on PacBio or ultra-long reads.

Findings

01

K-3018 and Svyatogor flax genome assemblies reached 491.1 Mb and 497.8 Mb, respectively, with most chromosomes assembled to full length.

02

The assemblies surpassed the quality of the current NCBI reference genome for flax variety T397.

03

Flax genomes showed high similarity at the chromosome level with only a few large-scale differences.

Abstract

In recent years, the quality of genome assemblies has notably improved, primarily due to advances in third-generation sequencing technologies and bioinformatics tools. In the present study, we obtained genome assemblies for two flax (Linum usitatissimum L.) varieties, K-3018 and Svyatogor, using Oxford Nanopore Technologies (ONT) simplex R10.4.1 data and the Hifiasm algorithm optimized for ONT reads. The K-3018 genome assembly was 491.1 Mb and consisted of thirteen full-length chromosomes and two one-gap chromosomes. The Svyatogor genome assembly was 497.8 Mb and consisted of twelve full-length chromosomes and three one-gap chromosomes. All chromosomes had telomeric repeats at their ends for both varieties. Hi-C contact maps and Illumina genomic data supported the accuracy of the obtained assemblies. The K-3018 and Svyatogor genome assemblies surpassed the quality of the best currently…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Linum usitatissimum(flax · species)

Mutations1

T2T

Figures4

Click any figure to enlarge with its caption.

Funding1

—Ministry of Science and Higher Education of the Russian Federation

Keywords

flaxLinum usitatissimumNanoporelong-read sequencinggenome assemblytelomere-to-telomereHifiasm

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChromosomal and Genetic Variations · Genomics and Phylogenetic Studies · Phytoestrogen effects and research

Full text

1. Introduction

Over the past 25 years, significant progress has been made in plant genome sequencing, resulting in genome assemblies for numerous crop species [1]. In recent years, the quality of these assemblies has notably improved, primarily due to the advances in third-generation sequencing technologies, which can produce long reads spanning tens or hundreds of kilobases (kb), as well as improvements in bioinformatics tools for genome assembly [1,2,3,4]. There are two main third-generation sequencing platforms: Pacific Biosciences (PacBio), which produces high-accuracy HiFi reads, and Oxford Nanopore Technologies (ONT), which produces longer but less accurate reads. Various approaches and tools have also been developed to obtain high-quality genome assemblies from long reads [5,6,7].

High-quality plant genomes are used in many research areas, including functional genomics, evolutionary and phylogenetic studies, breeding, and genome editing [8,9,10,11,12,13,14]. The quality of genome assemblies is improving, and telomere-to-telomere (T2T) assembly is becoming the new standard in genomics for important crops [15].

Flax (Linum usitatissimum L.) is a multipurpose self-pollinated agricultural crop cultivated for its seeds and stems. The seeds are primarily used for pharmaceuticals and functional foods due to their high content of health-promoting lignans and omega-3 fatty acids, as well as for technical purposes [16,17,18,19,20,21]. The history of sequencing the flax genome (2n = 30) spans more than 10 years, progressing from an assembly of ~300 Mb consisting of multiple contigs to near-T2T-level genome assemblies of ~500 Mb [22,23,24,25,26,27,28,29,30,31]. In 2024, we assembled the 489 Mb genome of the variety K-3018 (NCBI Genome, GCA_048169125.1), which belongs to L. usitatissimum L. var. intermedia Vav. et Ell. and is considered linseed by the direction of its use. The assembly consisted of 22 large contigs. Eight full chromosomes were assembled from telomere to telomere. Five chromosomes were represented by two contigs each and two chromosomes were represented by three contigs each. All assembled chromosomes contained telomeric repeats at their ends [29]. Later, Lu et al. [30] stated that they assembled the gapless flax genome using ultra-long (UL) ONT and PacBio HiFi data. This 483 Mb genome contained telomeres at the ends of all 15 chromosomes. Unfortunately, this assembly, which was deposited to the China National Center for Bioinformation, will not be available until 30 June 2026 (CNCB, PRJCA037526, https://ngdc.cncb.ac.cn/gwh/Assembly/92583/show, accessed on 1 November 2025). Then, the 495 Mb flax genome of Indian linseed variety T397 was published by Yadav et al. It was primarily assembled from ONT and PacBio HiFi data and scaffolded into 20 super-scaffolds using optical mapping. Thirteen of the super-scaffolds corresponded to full-length chromosomes. The resulting 15-chromosome assembly had five gaps and 29 out of 30 telomeres were identified (NCBI Genome, GCA_051167515.1) [31].

Since we published our work on assembling the K-3018 flax genome, the Hifiasm algorithm optimized for ONT reads has emerged. This allowed high-quality genome assemblies to be produced using ONT sequencing data better than ever. The aim of our current study was to demonstrate progress in the flax genome assembly from ONT data by improving the quality of the variety K-3018 assembly and generating a genome assembly of the valuable dual-purpose (for seeds and fiber) flax variety Svyatogor (L. usitatissimum var. intermedia). Individual plants of varieties K-3018 and Svyatogor were previously selected by us for flax pan-genomic studies on the basis of genotyping results [32].

2. Results and Discussion

2.1. Genome of the Variety K-3018

Previously, we sequenced the genome of variety K-3018 on the ONT platform and generated 57.7 Gb of ONT data corresponding to 115× coverage for a 500 Mb genome. After filtering for a minimum read length of 10 kb, 83× coverage remained. The genome was assembled using Hifiasm UL (v.0.19.9-r616) in double-graph mode using ONT reads corrected with the Herro algorithm integrated into the Dorado basecalling tool and deposited to NCBI Genome–GCA_048169125.1 [29]. In the present study, we reassembled the genome of variety K-3018 using the Hifiasm ONT algorithm (v.0.25.0-r726), which is optimized for ONT reads. We obtained the assembly of 491.1 Mb comprising 17 contigs, including 13 full-length chromosomes and two chromosomes split into two contigs each. Using the Hi-C data generated in this study, we successfully merged fragmented chromosomes 7 and 11 (Supplementary Figure S1). Telomeric repeats were identified at the ends of all 15 chromosomes (Supplementary Figure S2). Therefore, the Hifiasm ONT algorithm enabled us to significantly improve the contiguity of the K-3018 genome assembly, leaving only two gaps (versus nine in the previous version of the K-3018 genome assembly). Global genome alignment showed the similarity between the version 1 (v1) and version 2 (v2) assemblies of the K-3018 genome (Figure 1).

In the K-3018 v2 assembly, the chromosomes were named according to the chromosome numbers of the first flax genome assembly of the variety CDC Bethune (NCBI Genome, GCA_000224295.2) [23], following the consensus genetic map of flax [33]. In contrast, in the YY5, K-3018 v1, and T397 genome assemblies, the chromosomes were named according to the sorted order of scaffold numbers in the CDC Bethune assembly rather than the chromosome numbers. In the CDC Bethune genome assembly, scaffold numbers and chromosomes numbers are ordered not in the same way, for example, scaffold CP027619.1 is chromosome 1, scaffold CP027620.1 is chromosome 10, and scaffold CP027626.1 is chromosome 2 (NCBI Genome, GCA_000224295.2) [23]. The CDC Bethune chromosome numbers should be used to name the chromosomes in the obtained flax genome assemblies, not the scaffold numbers. Chromosome numbers in the genome assemblies of CDC Bethune and K-3018 v2 correspond to the following chromosome numbers in the genome assemblies of YY5, K-3018 v1, and T397: Chr1–Chr1, Chr2–Chr8, Chr3–Chr9, Chr4–Chr10, Chr5–Chr11, Chr6–Chr12, Chr7–Chr13, Chr8–Chr14, Chr9–Chr15, Chr10–Chr2, Chr11–Chr3, Chr12–Chr4, Chr13–Chr5, Chr14–Chr6, Chr15–Chr7.

BUSCO statistics for the new K-3018 assembly (v2) also revealed slight improvements compared to the previous version (v1): 2232 complete BUSCOs (up from 2228) corresponding to the completeness rise from 95.8% to 96.0%. In the K-3018 v2 genome assembly, 37,081 gene models and 43,116 transcript models were predicted.

We also aligned the Hifiasm ONT-derived genome assembly of variety K-3018 (v2) with the T397 assembly (NCBI Genome, GCA_051167515.1) [31], which is currently the reference in the NCBI Genome database for L. usitatissimum (as of 1 November 2025) (Figure 2).

In general, the assemblies were quite similar, but with a few notable exceptions. Two large inversions were detected in Chr10 and Chr13 according to the K-3018 v2 (CDC Bethune) chromosome numeration. Other structural differences included a contiguous variation near the centromeric region of Chr15 and an increased length of Chr08 in the K-3018 v2 genome assembly.

In the T397 genome assembly, Chr05, which corresponds to Chr13 in the K-3018 v2 assembly, contained telomeric repeats only at one end [31] (see also Supplementary Figure S3). In contrast, Chr13 in the K-3018 v2 assembly had telomeric repeats at both ends (Supplementary Figure S2), indicating its accuracy. The analyzed chromosome in the T397 assembly had a half-chromosome-length inversion compared to the K-3018 v2 assembly. The large size of the inversion and the absence of telomeric repeats at the inverted part of the chromosome suggest an error in the T397 genome assembly rather than genetic variation in the genomes of the examined flax varieties.

Another major difference was observed in Chr14 of the T397 assembly, which corresponds to Chr08 in the K-3018 v2 genome assembly. This chromosome was 9.5 Mb shorter in the T397 assembly than in the K-3018 v2 assembly. As previously reported for the K-3018 v1 assembly, this chromosome contains extended telomeric repeats within its internal region [29]. FISH experiments also identified telomeric repeats within the centromeric heterochromatin of one pair of the largest flax chromosomes [34]. Therefore, these internal telomeric repeats are not an artifact. However, in the T397 assembly, this chromosome ends at the location of the internal telomeric repeats in the K-3018 v2 genome assembly (Supplementary Figures S2 and S3, see Chr08 and Chr14, respectively). This suggests that the distal portion of the largest flax chromosome is missing in the T397 assembly.

Other misalignments between the K-3018 v2 and T397 assemblies could be associated with both assembly errors and genetic differences. The T397 genome was initially assembled into 1330 contigs, which were then scaffolded with optical mapping into 20 sequences. They were subsequently curated and aligned to the YY5 assembly, generating 15 chromosome-level scaffolds. In contrast, the K-3018 genome was assembled directly into 17 contigs (with only two remaining gaps) using the ONT sequencing data and Hifiasm ONT algorithm. Telomeric repeats were present at all 30 chromosome ends, indicating high assembly completeness and accuracy.

Unfortunately, another flax genome assembly, which was stated as gapless, will not be available until 30 June 2026 (CNCB, PRJCA037526, https://ngdc.cncb.ac.cn/gwh/Assembly/92583/show, accessed on 1 November 2025).

2.2. Genome of the Variety Svyatogor

We sequenced the genome of the flax variety Svyatogor using the ONT platform. As a result, after basecalling with at least Q10 read quality, we generated 27.5 Gb of sequencing data with an N50 of 15.4 kb corresponding to 55× coverage for a 500 Mb genome (46× coverage after filtering for a minimum read length of 5 kb). The Hifiasm ONT algorithm produced a 497.8 Mb genome assembly consisting of 18 contigs with an N50 of 33.3 Mb, and the largest contig was 40.8 Mb. Twelve chromosomes were gapless, while three chromosomes were split into two contigs each. The Svyatogor genome assembly had one more gap than the K-3018 v2 assembly obtained in the present study (three versus two), but it surpassed the T397 assembly [31] in this parameter (three gaps versus five). All assembled chromosomes of variety Svyatogor had telomeric repeats at both ends (Supplementary Figure S4), outperforming the T397 assembly, which had correct telomeric repeats at both ends of 13 chromosomes and at one end of two chromosomes (Chr05 and Chr14).

BUSCO statistics for the Svyatogor genome assembly were 95.9% with 2230 complete BUSCOs. In the Svyatogor genome assembly, 37,207 gene models and 43,303 transcript models were predicted. For the variety Svyatogor, we also obtained 23.8 Gb of Illumina genome sequencing data (48× coverage for a 500 Mb genome). Using these data, a k-mer analysis evaluated the flax genome size to be ~474 Mb (rough estimate), assembly completeness score–99.0%, and QV score–55.3, indicating high completeness and sequence accuracy of the assembled genome. Moreover, we generated the Hi-C data, and the contact map (Supplementary Figure S5) revealed no abnormal interactions.

We compared the Hifiasm ONT-derived genomes of the varieties K-3018 and Svyatogor (Figure 3). The assemblies were generally similar, though some differences were observed in the central regions of several chromosomes, which are likely centromeric (especially noticeable on Chr15). These regions are rich in repeats and poor in genes, so differences could occur.

We also compared the genome assemblies of the varieties Svyatogor and T397 (Figure 4). Interestingly, the region near the centromere of Chr15 in Svyatogor was more similar to the corresponding region in T397 than to that in K-3018, suggesting that this region can be highly variable in flax genomes. Chr08 and Chr13 of Svyatogor assembly had the same differences from the corresponding chromosomes of T397 assembly as had Chr08 and Chr13 of K-3018 assembly. This further supported the incorrect/incomplete assembly of these chromosomes in the T397 genome.

Flax genome assemblies of varieties CDC Bethune v2 [23], Atlant [25], YY5 v2 [26], 3896 [27], Heiya 14 [24], Longya 10 [24], Neiya 9 [28], K-1531 [35], which significantly inferior in quality to the genome assemblies of varieties K-3018, Svyatogor, and T397, including decreased genome size, absence of telomeres, tenth and hundreds of gaps, were compared in the previous studies [31,35]. In the present study, we performed comparison of statistics for the latest and best genome assemblies of flax varieties K-3018, Svyatogor, and T397 (Table 1) and created a synteny plot for them (Supplementary Figure S6). Overall, the statistics were quite similar. However, the Svyatogor and K-3018 v2 assemblies slightly outperformed the T397 assembly in all statistics except for the QV score, which was slightly higher for the T397 assembly. The synteny plot (Supplementary Figure S6) showed a number of small translocations, inversions, and/or duplications between the genome assemblies of the three analyzed flax genotypes, which may be related to intervarietal differences. In addition, as we already mentioned above, two substantial (half-chromosome) inversions were present in the T397 assembly within Chr10 and Chr13 (according to the Svyatogor and K-3018 v2 assemblies), thereby differentiating it from the Svyatogor and K-3018 v2 genome assemblies. Because of the inversions’ large sizes and the absence of telomeres at one end of Chr13, these inversions are more likely the T397 assembly errors than intervarietal differences. Additionally, Chr8 (according to the Svyatogor and K-3018 v2 assemblies) in the T397 assembly lacked a significant segment of the chromosome present in the Svyatogor and K-3018 v2 assemblies. This segment is located behind the intrachromosomal telomeric repeat-reach region mentioned above. Chr15 is of particular interest because the highest number of large interchromosomal rearrangements was identified between the three genotypes in this chromosome. Chr15 may differ significantly between flax genotypes; however, additional high-quality flax genome assemblies are needed to clarify this issue.

In our previous study, we compared version 1 (v1) of the K-3018 genome assembly with the YY5 assembly (https://zenodo.org/record/4872894, accessed on 15 August 2025) [26], the best available at the time. We revealed a significant number of mismatches. Based on our analysis of telomeric repeats, we suggested that some of these mismatches were because of the YY5 assembly errors [29]. The present study included the analyses of three genome assemblies: K-3018, Svyatogor, and T397. The results indicated that flax genomes are generally quite similar at the chromosome level, with only a few large-scale differences. Since the YY5 genome assembly likely contains a significant number of misassemblies, comparison of the obtained flax genomes with the YY5 assembly, as was performed in the study by Yadav et al. [31], could lead to misidentification of structural variations. However, several high-quality near-T2T assemblies allow for the proper identification of large-scale genomic alterations in flax genomes.

2.3. Progress in the Flax Genome Assembly

In the present study, we obtained two high-quality near-T2T flax genome assemblies with only a few gaps using the ONT simplex sequencing data and ONT-optimized Hifiasm algorithm. All chromosomes in these genome assemblies had pronounced telomeric repeats at the ends, and most chromosomes were assembled from telomere to telomere. Only three chromosomes in Svyatogor and two chromosomes in K-3018 had one gap each. These gaps represent unresolved highly repetitive regions, which can be suggested to be centromeric based on their location. The obtained assemblies surpassed the contiguity and correctness of the T397 genome assembly, which was classified as T2T by Yadav et al. [31] and marked as a reference for L. usitatissimum in the NCBI Genome database (as of 1 November 2025, GCA_051167515.1). The T397 genome was assembled using PacBio HiFi and ONT data with optical mapping [31]. In contrast to PacBio HiFi-derived assemblies, flax genomes assembled from R9.4.1 ONT data lacked the sequence accuracy and required polishing with precision Illumina reads [27,35,36]. However, the sequence accuracy of flax genomes assembled from R10.4.1 ONT data and from PacBio HiFi data was already comparable [29]. Using the obtained Illumina reads, we demonstrated the high sequence accuracy of the Svyatogor genome assembly–the QV score was more than 55. Thus, new genome assembly strategies optimized for ONT simplex data and recent algorithmic advances in error correction, including an improved Hifiasm ONT correction module that reliably distinguishes true sequencing errors from phasing differences [37] and avoids the assumption that ONT errors are randomly distributed, have further enhanced the accuracy of ONT-only assemblies. This module has also outperformed alternative machine learning-based correction approaches (such as Herro [38]), enabling the generation of near-error-free genome assemblies. As a result, recent progress in ONT sequencing and bioinformatics has allowed us to produce accurate near-T2T flax genome assemblies from the ONT simplex R10.4.1 reads using Hifiasm ONT without involving PacBio HiFi or UL ONT reads as well as optical maps. These assemblies can compete with and surpass those produced using PacBio HiFi data with either Hi-C scaffolding or UL-ONT hybrid strategies, which rely on costly and difficult-to-generate datasets.

Numerous studies have aimed to develop genetic markers of valuable flax traits [39]. However, most of these studies used the genome assembly of the variety CDC Bethune as a reference. This assembly was initially produced from short Illumina reads [22,23] and then scaffolded to chromosome level. Thus, it was incomplete and insufficiently extensive (only about 316 Mb). New near-T2T-level assemblies of the flax genomes are essential for improving the efficiency of flax genetic research and for evaluating flax genetic diversity at the genome level. They are also crucial for developing approaches for marker-assisted and genomic selection, as well as precise genome editing.

3. Materials and Methods

3.1. DNA Preparation and Sequencing on the ONT Platform

The high-yield dual-purpose flax variety Svyatogor, which can be used for both seed and stem production and has high resistance to abiotic stresses, was selected for the study.

DNA was extracted from leaves as described in our previous study with the stage of nuclei isolation [27]. The SQK-LSK109 kit (ONT, Oxford, UK) and SQK-LSK114 kit (ONT) were used for DNA library preparation. Genome sequencing was performed using MinION and PromethION sequencers (ONT) with R10.4.1 flow cells (ONT).

3.2. Construction and Sequencing of Genomic and Hi-C Libraries on the Illumina Platform

Illumina genomic library for the variety Svyatogor was prepared using the TransNGS Fragmentase DNA Library Prep Kit for Illumina (Transgen, Beijing, China).

Hi-C libraries were prepared for varieties K-3018 and Svyatogor. Chromatin was fixed with formaldehyde using vacuum infiltration. Nuclei were isolated from the fixed leaves. Then, according to a slightly modified protocol [40], restriction with DpnII (New England Biolabs, Ipswich, MA, USA), biotin labeling with Biotin-15-dCTP (Biosan, Novosibirsk, Russia), ligation of DNA fragments with T4 DNA Ligase (New England Biolabs), and phenol-chloroform DNA extraction were performed. Next, the biotin was removed from the unligated ends of the DNA fragments and the Hi-C libraries were prepared using the Qiaseq FX DNA kit (Qiagen, Chatsworth, MD, USA) including the step for enrichment of the biotinylated fragments with Streptavidin Magnetic Beads (New England Biolabs).

The quality and concentration of the libraries were assessed by 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and Qubit 4.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA), respectively. The genomic and Hi-C libraries were sequenced on NovaSeq 6000 and NovaSeq X Plus (Illumina, San Diego, CA, USA), respectively, with 150 + 150 nucleotide reads.

3.3. Assembly of Flax Genomes

For the genome assembly of variety K-3018, we used the ONT data previously obtained by us (NCBI SRA, SRX26506354). For the genome assembly of variety Svyatogor, we used the ONT reads obtained in the present study.

Basecalling was performed using the Dorado software package (v.1.0.2, https://github.com/nanoporetech/dorado, accessed on 25 August 2025) with a minimum average read quality of Q10. The [email protected] (sup) model was used. The length and quality distribution for the ONT reads was assessed using NanoPlot (v.1.44.1) [41]. Adapter sequences were additionally trimmed using Porechop (v.0.2.4) (https://github.com/rrwick/Porechop, accessed on 25 August 2025) with default parameters. The ONT reads were filtered for a minimum length of 10 kb for the variety K-3018 and 5 kb for the variety Svyatogor and used for genome assembly with the Hifiasm algorithm (v.0.25.0-r726) [37] with parameters optimized for ONT reads (--ont–preset for ONT-based assembly; -l0–disables the purge of duplicates, as recommended for haploid or imbred genome assemblies; --telo-m CCCTAAA–sets telomere motif as CCCTAAA).

The Illumina data for Svyatogor were trimmed using fastp 1.0.1 [42] with default parameters. The T397 Illumina genomic data were downloaded from NCBI SRA (SRX25430339) and treated the same way.

Assembly statistics (including assembly length, number and length of contigs) were assessed using QUAST (v.5.3.0) [43]. Softmasking for genome assemblies was performed with RepeatMasker 4.2.1 [44] using a de novo repeat database constructed with RepeatModeler 2.0.7 [44] using RepBase (2018) [45] and Dfam 3.9 [46]. Structural annotations of genome assemblies were created with Braker3 [47] using the numerous transcriptomic data for the flax variety 3896 [48] and Viridiplantae OrthoDB 11 database [49]. Completeness evaluation of genome assemblies was performed using BUSCO (v.5.8.3) with the eudicots_odb10 database [50] using the genome mode based on miniprot alignment. Consensus quality value (QV) and assembly completeness scores were assessed with Merqury (v1.3) using default parameters for a haploid assembly and k value of 20 [51]. Global alignment of the assemblies was performed using LAST (v.1639) [52] with the --uRY128 option (initial matches between query and reference sequences were searched on ~1/128 of positions in each sequence). The TIDK software package (v.0.2.64) [53] was used for identification of telomeric sequences (the CCCTAAA telomeric motif was searched).

The Hi-C reads were aligned to the corresponding genome assemblies of varieties K-3018 and Svyatogor with BWA-MEM (v.0.7.19) (-5SP parameters) [54]. The identification of ligation sites was performed using pairtools (v.1.1.3) [55]. The contact maps were obtained with Juicer Tools (v.1.22.01) and visualized with Juicebox (v.1.11.08) [56].

Mapping of Illumina reads was performed using BWA-MEM (v.0.7.19) and k-mer distribution analysis was performed using Jellyfish (v.2.3.1) [57] with high count value of histogram set to 100,000. The genome size and duplication rate were assessed using GenomeScope2 with a k-mer length of 20 and ploidy set to 1 (v.2.0.1) [58].

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Sun Y. Shang L. Zhu Q.H. Fan L. Guo L. Twenty years of plant genome sequencing: Achievements and challenges Trends Plant Sci.20222739140110.1016/j.tplants.2021.10.00634782248 · doi ↗ · pubmed ↗
2Espinosa E. Bautista R. Larrosa R. Plata O. Advancements in long-read genome sequencing technologies and algorithms Genomics 202411611084210.1016/j.ygeno.2024.11084238608738 · doi ↗ · pubmed ↗
3Zhang T. Zhou J. Gao W. Jia Y. Wei Y. Wang G. Complex genome assembly based on long-read sequencing Brief. Bioinform.202223 bbac 30510.1093/bib/bbac 30535940845 · doi ↗ · pubmed ↗
4Medhi U. Chaliha C. Singh A. Nath B.K. Kalita E. Third generation sequencing transforming plant genome research: Current trends and challenges Gene 202594014918710.1016/j.gene.2024.14918739724994 · doi ↗ · pubmed ↗
5Li H. Durbin R. Genome assembly in the telomere-to-telomere era Nat. Rev. Genet.20242565867010.1038/s 41576-024-00718-w 38649458 · doi ↗ · pubmed ↗
6Diaz-Riano J.I. Duitama J. Current progress in phased genome assembly from long-read DNA sequencing data Methods Mol. Biol.20252955517010.1007/978-1-0716-4702-8_440736893 · doi ↗ · pubmed ↗
7Mahmoud M. Agustinho D.P. Sedlazeck F.J. A Hitchhiker’s guide to long-read genomic analysis Genome Res.20253554555810.1101/gr.279975.12440228901 PMC 12047252 · doi ↗ · pubmed ↗
8Bernal-Gallardo J.J. de Folter S. Plant genome information facilitates plant functional genomics Planta 202425911710.1007/s 00425-024-04397-z 38592421 PMC 11004055 · doi ↗ · pubmed ↗