Information weights of nucleotides in DNA sequences
M.R. Dudek, S. Cebrat, M. Kowalczuk, P. Mackiewicz, A. Nowicka, D., Mackiewicz, M. Dudkiewicz

TL;DR
This paper introduces a method to quantify the information content of nucleotides in DNA sequences, estimating the non-mutational information and applying it to gene reconstruction.
Contribution
It presents a novel approach to calculate nucleotide information weights using substitution matrices and demonstrates its application to genome analysis.
Findings
Estimated lower bounds of non-mutational information in nucleotides.
Successfully reconstructed gene oligomers using the proposed information weights.
The method is general and applicable to any genome.
Abstract
The coding sequence in DNA molecule is considered as a message to be transferred to receiver, the proteins, through a noisy information channel and each nucleotide is assigned a respective information weight. With the help of the nucleotide substitution matrix we estimated the lower bound of the amount of information carried out by nucleotides which is not subject of mutations. We used the calculated weights to reconstruct k-oligomers of genes from the Borrelia burgdorferi genome. We showed, that to this aim there is sufficient a simple rule, that the number of bits of the carried information cannot exceed some threshold value. The method introduced by us is general and applies to every genome.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
