# ConNIS and labeling instability: New statistical methods for improving the detection of essential genes in TraDIS libraries

**Authors:** Moritz Hanke, Theresa Harten, Ronja Foraita, Ilya Ioshikhes, Jinyan Li, Ilya Ioshikhes, Jinyan Li, Ilya Ioshikhes, Jinyan Li, Ilya Ioshikhes, Jinyan Li

PMC · DOI: 10.1371/journal.pcbi.1013428 · PLOS Computational Biology · 2026-03-06

## TL;DR

ConNIS is a new statistical method that improves the detection of essential genes in bacterial genomes using TraDIS data, especially in sparsely inserted libraries.

## Contribution

ConNIS introduces an exact probability model for insertion-free sequences and a data-driven instability criterion for setting parameters.

## Key findings

- ConNIS outperforms existing methods in detecting essential genes, especially in low or medium insertion density libraries.
- A subsample-based instability criterion improves parameter selection and result comparability across TraDIS methods.
- An R package and web application were developed to implement ConNIS and facilitate reproducibility.

## Abstract

The identification of essential genes in Transposon Directed Insertion Site Sequencing (TraDIS) data relies on the assumption that transposon insertions occur randomly in non-essential regions, leaving essential genes largely insertion-free. While intragenic insertion-free sequences have been considered as a reliable indicator for gene essentiality, so far, no exact probability distribution for these sequences has been proposed. Further, many methods require setting thresholds or parameter values a priori without providing any statistical basis, limiting the comparability of results. Here, we introduce Consecutive Non-Insertion Sites (ConNIS), a novel method for gene essentiality determination. ConNIS provides an analytic solution for the probability of observing insertion-free sequences within genes of given length and considers variation in insertion density across the genome. Based on an extensive simulation study and different real-world scenarios, ConNIS was found to be superior to prevalent state-of-the-art methods, particularly when libraries had only a low or medium insertion density. In addition, our results showed that the precision of existing methods can be improved by incorporating a simple weighting factor for the genome-wide insertion density. To set methodically embedded parameter and threshold values of TraDIS methods a subsample-based instability criterion was developed. Application of this criterion in real and synthetic data settings demonstrated its effectiveness in selecting well-suited parameter/threshold values across methods. An R package and an interactive web application are provided to facilitate application and reproducibility.

Identifying essential genes in bacteria is key to understanding their ability to survive, which can, for example, be applied to the development of new treatments. One way to do identify these genes is by creating libraries where small DNA fragments (“insertions”) are randomly placed in the genome: essential genes tend to remain insertion-free because insertions disrupt their function. The challenge is to determine whether a (long) uninterrupted sequence is due to chance or because the gene is truly essential. Here, we present Consecutive Non-Insertion Sites (ConNIS), a statistical method that calculates the probability of such insertion-free sequences. Extensive comparisons on simulated and real datasets show that ConNIS outperforms existing methods, especially when a library is rather sparse in terms of the total number of insertion sites. Since many analysis methods rely on parameter values that have to be set before the analysis and can heavily influence the final results, we also propose a data-driven approach to set these values, making results more comparable across studies. Our methods are freely available as an R package and all results are presented in a web app.

## Full-text entities

- **Diseases:** TIS (MESH:C538388), ConNIS (MESH:D009371)
- **Chemicals:** Anita Estes (-), agar (MESH:D000362)
- **Species:** Escherichia coli BW25113 (no rank) [taxon 679895], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Homo sapiens (human, species) [taxon 9606], Escherichia coli str. K-12 substr. MG1655 (no rank) [taxon 511145], Escherichia coli (E. coli, species) [taxon 562], Salmonella enterica subsp. enterica serovar Typhimurium (no rank) [taxon 90371]
- **Cell lines:** BW25113 — Mus musculus (Mouse), Hepatocellular carcinoma of the mouse, Cancer cell line (CVCL_X356), MG1655 — Homo sapiens (Human), Maple syrup urine disease, Transformed cell line (CVCL_D514), E. coli K-12 — Mus musculus (Mouse), Hybridoma (CVCL_C5CR)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12991369/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12991369/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/PMC12991369/full.md

---
Source: https://tomesphere.com/paper/PMC12991369