# A comprehensive evaluation of long-read de novo transcriptome assembly

**Authors:** Feng Yan, Pedro L. Baldoni, James Lancaster, Matthew E. Ritchie, Mathew G. Lewsey, Quentin Gouil, Nadia M. Davidson

PMC · DOI: 10.1186/s13059-026-04001-5 · Genome Biology · 2026-02-18

## TL;DR

This paper evaluates long-read de novo transcriptome assembly tools to determine their effectiveness in cases where a reference genome is unavailable.

## Contribution

The study provides a comprehensive benchmark of long-read de novo transcriptome assembly tools and their impact on differential expression analysis.

## Key findings

- Long-read assemblies produce longer transcripts than short-read assemblies but are less accurate than reference-guided methods.
- RNA-Bloom2 with Corset clustering outperformed other tools in accuracy and computational efficiency.
- Assembly choice significantly affects the detection of differential gene and transcript expression.

## Abstract

Recently, de novo transcriptome assembly methods have been developed to utilise long-read data in cases where a reference genome is unavailable, such as in non-model organisms. Despite the potential of these tools, there remains a lack of benchmarking and established protocols for optimal reference-free, long-read transcriptome assembly and differential expression analysis.

Here, we evaluate the long-read de novo transcriptome assembly tools, RATTLE, RNA-Bloom2 and isONform, and compare their performance to one of the leading short-read assemblers, Trinity. We assess various metrics across a range of datasets, which include simulated data and spike-in sequin transcripts, where ground truth is known, and real data from human and pea (Pisum sativum) samples, using a reference-based approach to define truth. To represent contemporary analysis scenarios, the datasets cover depths from 6 to 60 million reads, Oxford Nanopore Technologies (ONT) cDNA, ONT direct RNA and Pacific Biosciences (PacBio) 10 × single-cell sequencing. Critically, we assess the downstream impact of assembly choice on the detection of differential gene and transcript expression.

Our results confirm that long reads generate longer assembled transcripts than short-reads for reference-free analysis, though limitations remain compared to reference-guided approaches, and suggest scope for improved accuracy and reduced redundancy. Of the de novo pipelines, RNA-Bloom2, coupled with Corset for transcript clustering, was the best performing in terms of both accuracy and computational efficiency. Our findings offer guidance when selecting the most effective strategy for long-read differential expression analysis, when a high-quality reference genome is unavailable.

The online version contains supplementary material available at 10.1186/s13059-026-04001-5.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Species:** Lathyrus oleraceus (garden pea, species) [taxon 3888], Homo sapiens (human, species) [taxon 9606], Powellomyces sp. EA (species) [taxon 252690]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13020369/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020369/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC13020369/full.md

---
Source: https://tomesphere.com/paper/PMC13020369