Google matrix analysis of DNA sequences

Vivek Kandiah; Dima L. Shepelyansky

arXiv:1301.1626·q-bio.GN·May 23, 2013

Google matrix analysis of DNA sequences

Vivek Kandiah, Dima L. Shepelyansky

PDF

TL;DR

This paper applies Google matrix analysis to DNA sequences, revealing scale-free network properties and spectral characteristics that highlight similarities and differences with web and linguistic networks.

Contribution

It introduces a novel application of Google matrix analysis to DNA sequences, uncovering their scale-free network features and spectral properties.

Findings

01

DNA sequences exhibit power-law distributed matrix elements.

02

The PageRank distribution decays algebraically with a large spectral gap.

03

DNA networks show scale-free features similar to WWW and linguistic networks.

Abstract

For DNA sequences of various species we construct the Google matrix G of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of G is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.