# AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides

**Authors:** Amin Darabi, Sayeh Sobhani, Rosa Aghdam, Changiz Eslahchi

PMC · DOI: 10.1186/s12859-024-05859-7 · 2024-07-16

## TL;DR

AFITBin is a new method for grouping metagenomic contigs using a novel l-mer frequency approach, improving the accuracy of microbial community analysis.

## Contribution

AFITBin introduces a new l-mer statistic vector and matrix factorization method for improved metagenomic binning.

## Key findings

- AFITBin outperforms existing methods in taxonomic identification of metagenomic contigs.
- The AFIT vector provides better clustering of species compared to traditional TNF methods.

## Abstract

Using next-generation sequencing technologies, scientists can sequence complex microbial communities directly from the environment. Significant insights into the structure, diversity, and ecology of microbial communities have resulted from the study of metagenomics. The assembly of reads into longer contigs, which are then binned into groups of contigs that correspond to different species in the metagenomic sample, is a crucial step in the analysis of metagenomics. It is necessary to organize these contigs into operational taxonomic units (OTUs) for further taxonomic profiling and functional analysis. For binning, which is synonymous with the clustering of OTUs, the tetra-nucleotide frequency (TNF) is typically utilized as a compositional feature for each OTU.

In this paper, we present AFIT, a new l-mer statistic vector for each contig, and AFITBin, a novel method for metagenomic binning based on AFIT and a matrix factorization method. To evaluate the performance of the AFIT vector, the t-SNE algorithm is used to compare species clustering based on AFIT and TNF information. In addition, the efficacy of AFITBin is demonstrated on both simulated and real datasets in comparison to state-of-the-art binning methods such as MetaBAT 2, MaxBin 2.0, CONCOT, MetaCon, SolidBin, BusyBee Web, and MetaBinner. To further analyze the performance of the purposed AFIT vector, we compare the barcodes of the AFIT vector and the TNF vector.

The results demonstrate that AFITBin shows superior performance in taxonomic identification compared to existing methods, leveraging the AFIT vector for improved results in metagenomic binning. This approach holds promise for advancing the analysis of metagenomic data, providing more reliable insights into microbial community composition and function.

A python package is available at: https://github.com/SayehSobhani/AFITBin.

The online version contains supplementary material available at 10.1186/s12859-024-05859-7.

## Full-text entities

- **Chemicals:** oligonucleotide (MESH:D009841)
- **Species:** Homo sapiens (human, species) [taxon 9606], Apis cerana (Asiatic honeybee, species) [taxon 7461], Finegoldia magna (species) [taxon 1260], Escherichia coli (E. coli, species) [taxon 562]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11253361/full.md

---
Source: https://tomesphere.com/paper/PMC11253361