KaMRaT: a C++ toolkit for k-mer count matrix dimension reduction
Haoliang Xue, Mélina Gallopin, Camille Marchet, Ha N Nguyen, Yunfeng Wang, Antoine Lainé, Chloé Bessiere, Daniel Gautheret

TL;DR
KaMRaT is a C++ toolkit for analyzing RNA-seq data to find sequences that are specific to certain conditions or differentially expressed.
Contribution
KaMRaT introduces a new method for k-mer count matrix reduction in RNA-seq data without relying on gene annotations.
Findings
KaMRaT identifies differentially expressed sequences using k-mer count statistics.
The toolkit merges overlapping k-mers into contigs for improved analysis.
It enables sample-specific k-mer selection based on occurrence patterns.
Abstract
KaMRaT is designed for processing large k-mer count tables derived from multi-sample, RNA-seq data. Its primary objective is to identify condition-specific or differentially expressed sequences, regardless of gene or transcript annotation. KaMRaT is implemented in C++. Major functions include scoring k-mers based on count statistics, merging overlapping k-mers into contigs and selecting k-mers based on their occurrence across specific samples. Source code and documentation are available via https://github.com/Transipedia/KaMRaT.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Genomics and Phylogenetic Studies
