Improved LINE-1 Detection through Pattern Matching by Increasing Probe Length
Juan O. López, Javier L. Quiñones, Emanuel D. Martínez

TL;DR
This paper improves a tool for detecting LINE-1 transposable elements in genomes by using longer DNA sequence probes, resulting in better accuracy in humans and other species.
Contribution
The novelty lies in increasing k-mer probe length from 50 to 75 or 100, improving L1 detection precision and recall across species.
Findings
Longer k-mer probes (75 or 100) improved L1 detection precision and recall in human genomes.
The improved L1PD method also showed better performance in detecting L1s in dog, horse, and cow genomes.
The updated software allows users to generate probes for other reference genomes.
Abstract
Long Interspersed Element-1 (LINE-1 or L1) is an autonomous transposable element, meaning that its DNA sequences are able to replicate themselves throughout the human genome. This activity may lead to genomic instability and is associated with several different diseases. Additionally, L1s are also capable of replicating other non-autonomous sequences, thereby increasing their disruptive impact. Although there are different tools available that may be used for L1 detection, the heuristics involved affect their accuracy. L1PD (LINE-1 Pattern Detection) uses a novel pattern-matching approach to detect L1s in human genomes, using a fixed set of k-mer probes of length 50 that were generated using the human reference genome GRCh38. This research aims to improve L1PD by using longer probes and testing whether this leads to better results. Additionally, experiments were performed to test the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChromosomal and Genetic Variations · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
