N-tuple Zipf Analysis and Modeling for Language, Computer Program and   DNA

Xiaocong Gan; Dahui Wang; Zhangang Han

arXiv:0908.0500·physics.data-an·August 5, 2009·2 cites

N-tuple Zipf Analysis and Modeling for Language, Computer Program and DNA

Xiaocong Gan, Dahui Wang, Zhangang Han

PDF

Open Access

TL;DR

This paper introduces a simple preferential selection model based on random copy-paste processes to explain the n-tuple power law observed in language, DNA, and computer code, supported by empirical data and simulations.

Contribution

It proposes a novel, simple model inspired by Simon's model that reproduces n-tuple Zipf laws and DNA symmetry breaking, validated by empirical data and simulations.

Findings

01

Model reproduces n-tuple power law in simulated data.

02

Estimation equations match empirical Zipf exponents.

03

Captures DNA symmetry breaking process.

Abstract

n-tuple power law widely exists in language, computer program code, DNA and music. After a vast amount of Zipf analyses of n-tuple power law from empirical data, we propose a model to explain the n-tuple power law feature existed in these information translational carriers. Our model is a preferential selection approach inspired by Simon's model which explained scaling law of single symbol in a sequence Zipf analysis. The kernel mechanism is neat and simple in our model. It can be simply described as a randomly copy and paste process, that is, randomly select a random segment from current sequence and attach it to the end repeatedly. The simulation of our model shows that n-tuple power law exists in model generated data. Furthermore, two estimation equations: the Zipf exponent and the minimal length of n-tuple for power law appears all correspond to empirical data well. Our model can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA and protein synthesis mechanisms · Fractal and DNA sequence analysis · DNA and Biological Computing