# Improved LINE-1 Detection through Pattern Matching by Increasing Probe Length

**Authors:** Juan O. López, Javier L. Quiñones, Emanuel D. Martínez

PMC · DOI: 10.3390/biology13040236 · 2024-04-02

## TL;DR

This paper improves a tool for detecting LINE-1 transposable elements in genomes by using longer DNA sequence probes, resulting in better accuracy in humans and other species.

## Contribution

The novelty lies in increasing k-mer probe length from 50 to 75 or 100, improving L1 detection precision and recall across species.

## Key findings

- Longer k-mer probes (75 or 100) improved L1 detection precision and recall in human genomes.
- The improved L1PD method also showed better performance in detecting L1s in dog, horse, and cow genomes.
- The updated software allows users to generate probes for other reference genomes.

## Abstract

Long Interspersed Element-1 (LINE-1 or L1) is an autonomous transposable element, meaning that its DNA sequences are able to replicate themselves throughout the human genome. This activity may lead to genomic instability and is associated with several different diseases. Additionally, L1s are also capable of replicating other non-autonomous sequences, thereby increasing their disruptive impact. Although there are different tools available that may be used for L1 detection, the heuristics involved affect their accuracy. L1PD (LINE-1 Pattern Detection) uses a novel pattern-matching approach to detect L1s in human genomes, using a fixed set of k-mer probes of length 50 that were generated using the human reference genome GRCh38. This research aims to improve L1PD by using longer probes and testing whether this leads to better results. Additionally, experiments were performed to test the effectiveness of L1PD in detecting L1s in other species, such as dogs, horses, and cows. The results showed that longer probes did improve precision and recall of L1s, not only in humans but in the other species as well.

Long Interspersed Element-1 (LINE-1 or L1) is an autonomous transposable element that accounts for 17% of the human genome. Strong correlations between abnormal L1 expression and diseases, particularly cancer, have been documented by numerous studies. L1PD (LINE-1 Pattern Detection) had been previously created to detect L1s by using a fixed pre-determined set of 50-mer probes and a pattern-matching algorithm. L1PD uses a novel seed-and-pattern-match strategy as opposed to the well-known seed-and-extend strategy employed by other tools. This study discusses an improved version of L1PD that shows how increasing the size of the k-mer probes from 50 to 75 or to 100 yields better results, as evidenced by experiments showing higher precision and recall when compared to the 50-mers. The probe-generation process was updated and the corresponding software is now shared so that users may generate probes for other reference genomes (with certain limitations). Additionally, L1PD was applied to other non-human genomes, such as dogs, horses, and cows, to further validate the pattern-matching strategy. The improved version of L1PD proves to be an efficient and promising approach for L1 detection.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606], Equus caballus (domestic horse, species) [taxon 9796], Bos taurus (bovine, species) [taxon 9913], Canis lupus familiaris (dog, subspecies) [taxon 9615]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11047891/full.md

---
Source: https://tomesphere.com/paper/PMC11047891