# KaMRaT: a C++ toolkit for k-mer count matrix dimension reduction

**Authors:** Haoliang Xue, Mélina Gallopin, Camille Marchet, Ha N Nguyen, Yunfeng Wang, Antoine Lainé, Chloé Bessiere, Daniel Gautheret

PMC · DOI: 10.1093/bioinformatics/btae090 · 2024-03-05

## TL;DR

KaMRaT is a C++ toolkit for analyzing RNA-seq data to find sequences that are specific to certain conditions or differentially expressed.

## Contribution

KaMRaT introduces a new method for k-mer count matrix reduction in RNA-seq data without relying on gene annotations.

## Key findings

- KaMRaT identifies differentially expressed sequences using k-mer count statistics.
- The toolkit merges overlapping k-mers into contigs for improved analysis.
- It enables sample-specific k-mer selection based on occurrence patterns.

## Abstract

KaMRaT is designed for processing large k-mer count tables derived from multi-sample, RNA-seq data. Its primary objective is to identify condition-specific or differentially expressed sequences, regardless of gene or transcript annotation.

KaMRaT is implemented in C++. Major functions include scoring k-mers based on count statistics, merging overlapping k-mers into contigs and selecting k-mers based on their occurrence across specific samples.

Source code and documentation are available via https://github.com/Transipedia/KaMRaT.

## Full-text entities

- **Diseases:** tumor (MESH:D009369)
- **Chemicals:** KaMRaT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** S2B — Homo sapiens (Human), Childhood T acute lymphoblastic leukemia, Cancer cell line (CVCL_1860)

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC10942800/full.md

---
Source: https://tomesphere.com/paper/PMC10942800