A word recurrence based algorithm to extract genomic dictionaries
Vincenzo Bonnici, Giuditta Franco, Vincenzo Manca

TL;DR
This paper introduces a novel information theory-based algorithm that extracts significant variable-length genomic word dictionaries, revealing inter-chromosomal similarities in human genomes.
Contribution
It presents an innovative method combining conceptual and empirical analyses to extract genomic dictionaries based on information content, advancing genomic sequence analysis.
Findings
Identifies significant genomic word dictionaries of variable length
Reveals inter-chromosomal similarities in human genomes
Demonstrates effectiveness of the information theory approach
Abstract
Genomes may be analyzed from an information viewpoint as very long strings, containing functional elements of variable length, which have been assembled by evolution. In this work an innovative information theory based algorithm is proposed, to extract significant (relatively small) dictionaries of genomic words. Namely, conceptual analyses are here combined with empirical studies, to open up a methodology for the extraction of variable length dictionaries from genomic sequences, based on the information content of some factors. Its application to human chromosomes highlights an original inter-chromosomal similarity in terms of factor distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Fractal and DNA sequence analysis
