A sensible proof connecting the scale-free feature with the Zipf-law

Fei Ma

arXiv:1904.08065·physics.soc-ph·August 8, 2023

A sensible proof connecting the scale-free feature with the Zipf-law

Fei Ma

PDF

Open Access

TL;DR

This paper establishes a rigorous theoretical link between the scale-free property of complex networks and the Zipf-law, providing a solid mathematical foundation for their connection and enabling the use of network analysis methods to study Zipf-law phenomena.

Contribution

It presents a formal mathematical derivation connecting scale-free networks with Zipf-law through vertex rank, bridging empirical observations with theoretical proof.

Findings

01

Derived an equivalent relation between scale-free features and Zipf-law.

02

Eliminated the lack of theoretical foundation for Zipf-law.

03

Validated the use of complex network analysis methods for Zipf-law studies.

Abstract

Most of various large-size complex systems in nature and society can be well described as complex networks (graphs) to better understand the evolutional mechanisms and dynamical functions behind themselves. Of some part follow scale-free behavior, that is, the ratio of the number of vertices with degree more than or equal to $k$ and order of the whole network obeys the expression $P_{c u m} (k) \sim k^{1 - γ}$ ( $2 < γ < 3$ ). Meanwhile, the Zipf-law, which satisfies this $f_{r} \sim r^{- α}$ ( $α$ close to unity), is also prevalent in many complex systems, such as word frequencies in text and city sizes. It can be easily noticed that the both above have same type of appearance, namely the known power-law. Compared to the scale-free feature proofed analytically by continuum theory, by far the latter in most cases still is thought of as an empirical principle in lots of science…

Equations40

f_{r} \sim \frac{C}{r ^{α}}

f_{r} \sim \frac{C}{r ^{α}}

P (k) \sim k^{- γ}, N_{k_{i}} = N P (k_{i}) = \frac{N}{k _{i}^{γ}} .

P (k) \sim k^{- γ}, N_{k_{i}} = N P (k_{i}) = \frac{N}{k _{i}^{γ}} .

k_{i} + δ_{+}^{i} \sum k_{ma x} N_{k_{i}} \leq r_{k_{i}} \leq k_{i} - δ_{-}^{i} \sum k_{ma x} N_{k_{i}}

k_{i} + δ_{+}^{i} \sum k_{ma x} N_{k_{i}} \leq r_{k_{i}} \leq k_{i} - δ_{-}^{i} \sum k_{ma x} N_{k_{i}}

N \int_{k_{i} + δ_{+}^{i}}^{k_{ma x}} k^{- γ} d k \leq r_{k_{i}} \leq N \int_{k_{i} - δ_{-}^{i}}^{k_{ma x}} k^{- γ} d k .

N \int_{k_{i} + δ_{+}^{i}}^{k_{ma x}} k^{- γ} d k \leq r_{k_{i}} \leq N \int_{k_{i} - δ_{-}^{i}}^{k_{ma x}} k^{- γ} d k .

\frac{N}{1 - γ} (k_{ma x}^{- γ + 1} - (k_{i} + δ_{+}^{i})^{- γ + 1}) \leq r_{k_{i}} \leq \frac{N}{1 - γ} (k_{ma x}^{- γ + 1} - (k_{i} - δ_{-}^{i})^{- γ + 1}) .

\frac{N}{1 - γ} (k_{ma x}^{- γ + 1} - (k_{i} + δ_{+}^{i})^{- γ + 1}) \leq r_{k_{i}} \leq \frac{N}{1 - γ} (k_{ma x}^{- γ + 1} - (k_{i} - δ_{-}^{i})^{- γ + 1}) .

\frac{N}{γ - 1} (k_{i} + δ_{+}^{i})^{- γ + 1} ⪯ r_{k_{i}} ⪯ \frac{N}{γ - 1} (k_{i} - δ_{-}^{i})^{- γ + 1} .

\frac{N}{γ - 1} (k_{i} + δ_{+}^{i})^{- γ + 1} ⪯ r_{k_{i}} ⪯ \frac{N}{γ - 1} (k_{i} - δ_{-}^{i})^{- γ + 1} .

k_{i} - δ_{-}^{i} ⪯ (\frac{N}{γ - 1})^{\frac{1}{γ - 1}} r_{k_{i}}^{\frac{1}{1 - γ}} ⪯ k_{i} + δ_{+}^{i} .

k_{i} - δ_{-}^{i} ⪯ (\frac{N}{γ - 1})^{\frac{1}{γ - 1}} r_{k_{i}}^{\frac{1}{1 - γ}} ⪯ k_{i} + δ_{+}^{i} .

k_{i} ⪯ (\frac{N}{γ - 1})^{\frac{1}{γ - 1}} r_{k_{i}}^{\frac{1}{1 - γ}} ⪯ k_{i} .

k_{i} ⪯ (\frac{N}{γ - 1})^{\frac{1}{γ - 1}} r_{k_{i}}^{\frac{1}{1 - γ}} ⪯ k_{i} .

f_{r_{k_{i}}} = k_{i} \sim \frac{C}{r _{k_{i}}^{α}} .

f_{r_{k_{i}}} = k_{i} \sim \frac{C}{r _{k_{i}}^{α}} .

P_{c u m} (k) \sim k^{1 - γ} ⟵ f i r s t - or d er in t e g r a l d e g r ee d i s t r ib u t i o n P (k) \sim k^{- γ} ⟵ f i r s t - or d er d i f f er e n t ia l P_{c u m} (k) \sim k^{1 - γ} . \vspace * 0 mm

P_{c u m} (k) \sim k^{1 - γ} ⟵ f i r s t - or d er in t e g r a l d e g r ee d i s t r ib u t i o n P (k) \sim k^{- γ} ⟵ f i r s t - or d er d i f f er e n t ia l P_{c u m} (k) \sim k^{1 - γ} . \vspace * 0 mm

T h e Z i p f - l a w f_{r} \sim \frac{C}{r ^{α}} ⟺ e x c han g in g d e g r ee d i s t r ib u t i o n P (k) \sim k^{- γ} . \vspace * - 2 mm

T h e Z i p f - l a w f_{r} \sim \frac{C}{r ^{α}} ⟺ e x c han g in g d e g r ee d i s t r ib u t i o n P (k) \sim k^{- γ} . \vspace * - 2 mm

P (k) \sim k^{- γ},

P (k) \sim k^{- γ},

f_{v} \sim r_{v}^{- 1} .

f_{v} \sim r_{v}^{- 1} .

α (k) = \frac{A _{k}}{∣ E ∣} = O (k^{- λ}), λ = γ - 2.

α (k) = \frac{A _{k}}{∣ E ∣} = O (k^{- λ}), λ = γ - 2.

A_{k} = j \geq k \sum n_{j} = ∣ V ∣ j \geq k \sum k P (k),

A_{k} = j \geq k \sum n_{j} = ∣ V ∣ j \geq k \sum k P (k),

A_{k} \sim \frac{∣ V ∣}{γ - 2} k^{- γ + 2} .

A_{k} \sim \frac{∣ V ∣}{γ - 2} k^{- γ + 2} .

k_{ma x} = O (∣ V ∣^{\frac{1}{γ - 1}}) .

k_{ma x} = O (∣ V ∣^{\frac{1}{γ - 1}}) .

α (k) \sim \frac{∣ V ∣}{γ - 1} k^{- γ + 2} / O (∣ V ∣) = O (k^{- λ}) .

α (k) \sim \frac{∣ V ∣}{γ - 1} k^{- γ + 2} / O (∣ V ∣) = O (k^{- λ}) .

∣ E ∣ = O (∣ V ∣) .

∣ E ∣ = O (∣ V ∣) .

β (k) = \frac{A _{k}}{∣ V _{k} ∣} = O (k^{- ξ}), ξ = 1.

β (k) = \frac{A _{k}}{∣ V _{k} ∣} = O (k^{- ξ}), ξ = 1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Bioinformatics and Genomic Networks

Full text

**A sensible proof connecting the scale-free feature with the Zipf-law

** Fei Ma*a,*111 The author’s E-mail: [email protected].

a School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China

Abstract: Most of various large-size complex systems in nature and society can be well described as complex networks (graphs) to better understand the evolutional mechanisms and dynamical functions behind themselves. Of some part follow scale-free behavior, that is, the ratio of the number of vertices with degree more than or equal to $k$ and order of the whole network obeys the expression $P_{cum}(k)\sim k^{1-\gamma}$ ( $2<\gamma<3$ ). Meanwhile, the Zipf-law, which satisfies this $f_{r}\sim r^{-\alpha}$ ( $\alpha$ close to unity), is also prevalent in many complex systems, such as word frequencies in text and city sizes. It can be easily noticed that the both above have same type of appearance, namely the known power-law. Compared to the scale-free feature proofed analytically by continuum theory, by far the latter in most cases still is thought of as an empirical principle in lots of science communities, particularly in social science. For this reason there is a need for either pointing out the inner connection between the two or distinguishing difference of one another. Here, for any arbitrary given scale-free network model of order $N$ , we report an equivalent relation between scale-free feature and the Zipf-law based on the vertex rank. By rigorous mathematical derivations, we eliminate the gap, lack of theoretical fundament of the Zipf-law. Therefore one can be convinced that it is reasonable to adopt methods already used to study complex networks to do the Zipf-law .

Keywords: Complex systems, Power-law, Scale-free feature, Zipf-law, Rank.

1 The first proof

In most real-life instances, each vertex in complex network with the same degree value plays different role and implements various function. Such an example is word’s rank in the long-scale text. In this situation, each word is assigned a unique number based on its own frequency. George Kingsley Zipf [2] addressed that the frequency $f_{r}$ with which words are used seems to meet a power law

[TABLE]

in which symbol $C$ is an invariant and $\alpha$ is approximately equal to unity.

With this rank hypothesis, the Zipf-law is popular enough around our daily-life world, such as word rank, etc. However there are no known proofs to clearly answer the reason for emergence of the Zipf-law such that it is still thought of as an empirical principle formulated using mathematical statistic. Although many published literatures attempt to seek for reasonable explorations, a rigorous and accessible mathematical proof can be not acquired. It is indeed a challenging and demanded work to bring a precise solution to identify the Zipf-law. Fortunately, in the following, we put forward a compacted proof. To do this, we have to recall the concept of scale-free model [3].

For a given scale-free model of order $N$ , we immediately see

[TABLE]

Based on unique rank number $r$ corresponding vertex frequency $f_{r}$ , we can randomly select a vertex in rank list composed of all vertices. Because of both commonly seen phenomena, rank number being a continuous natural number sequence from 1 to the maximum value $N$ and yet the degree (frequency) sequence not to be so, possible for a continuous integer interval to consist of many vertices with same degree (frequency). Without loss of generality, list all vertex in decrease order of vertex-degree. When choose a vertex $i$ of degree $k_{i}$ at random, its rank number $r_{k_{i}}$ must fall into this range between $\sum_{k_{i}+\delta^{i}_{+}}^{k_{max}}N_{k_{i}}$ and $\sum_{k_{i}+\delta^{i}_{-}}^{k_{max}}N_{k_{i}}$ , namely

[TABLE]

where symbol $N_{k_{i}}$ denotes the number of vertices of degree $k_{i}$ , $k_{max}$ the maximum degree value, $\delta^{i}_{+}$ the difference equal to the minimal degree value, which belongs to $k_{i}$ ’s left neighbor set in degree sequence and is more than $k_{i}$ , subtracting $k_{i}$ , as well $\delta^{i}_{-}$ the difference equal to $k_{i}$ subtracting the maximal degree value , which is in $k_{i}$ ’s left neighbor set in degree sequence and less than $k_{i}$ . If consider all degree value as continuous variables, combining Eq.1 and Eq.2 yields

[TABLE]

Using elementary integral calculations, we have

[TABLE]

Taking into consideration in any large-scale network the maximal degree value $k_{max}$ being several orders of magnitude in comparison with other degree values, we obtain asymptotically the following inequality by omitting the influence from $k_{max}$ at both sides of Eq.4, as follows

[TABLE]

Generally, the smaller both $\delta^{i}_{+}$ and $\delta^{i}_{-}$ are, the closer to $|V|$ rank seats are. In degree value density regions, either $\delta^{i}_{+}$ or $\delta^{i}_{-}$ can visit at the minimal value $1$ . Hence, it is available that we may keep approximation, having

[TABLE]

Again making further approximation yields

[TABLE]

Thus, we obtain

[TABLE]

here $C=\left(\frac{N}{\gamma-1}\right)^{\frac{1}{\gamma-1}}$ and $\alpha=\frac{1}{\gamma-1}$ . This is complete.

With our initial assumption for parameter $\gamma$ , $2<\gamma<3$ , the value of index $\alpha$ will naturally fall into the region $\frac{1}{2}<\gamma<1$ . The closer to 2 the degree exponent $\gamma$ is, the closer to unity the frequency index $\alpha$ is, showing directly which our result is in considerable agreement with the description of the Zipf-law. Not only so, we also provide a measure to asymptotically compute a value for parameter $C$ of the Zipf-law. To close our here discussions and highlight our main work, the relations among Pareto distribution $P_{cum}(x)$ , scale-free feature $P(x)$ and Zipf-law $f_{r}$ should be illustrated, as follows

[TABLE]

2 The second proof

In a scale-free network $G=(V,E)$ [3], degree distribution $P(k)$ follows

[TABLE]

where power-law exponent $\gamma$ is no less than $1$ . The network is considered sparse if exponent $\gamma$ is strictly more than $2$ . This is Power-law in the filed of complex networks.

It is widely known that given a corpus [2], the frequency $f_{v}$ of verb $v$ and its rank $r_{v}$ asymptotically obey

[TABLE]

This is famous Zipf-law due to G.K. Zipf.

**Theorem ** Power-law is equivalent to Zipf-law.

To prove the correctness of the theorem above, we need to introduce two lemmas as below.

Lemma 1 Given a sparse scale-free network $G=(V,E)$ , the ratio $\alpha(k)$ of the summation of degree of vertices whose degree is no less than $k$ , denoted by $A_{k}$ , to the number of edges $|E|$ is given by

[TABLE]

Proof By definition, quantity $A_{k}$ is given by

[TABLE]

in which $n_{j}$ presents the summation of degree of vertices whose degree is exactly equal to $j$ . Then, we have

[TABLE]

Note that we have used

[TABLE]

Next, we come to

[TABLE]

Notice also that we have used the following fact

[TABLE]

This is complete. ∎

Lemma 2 Given a sparse scale-free network $G=(V,E)$ , the ratio $\beta(k)$ of the summation of degree of vertices whose degree is no less than $k$ , denoted by $A_{k}$ , to the number of vertices of this kind $|V_{k}|$ is given by

[TABLE]

Proof It is easy to prove Lemma 2 based on Lemma 1 and the definition of $|V_{k}|$ . This is complete. ∎

From Lemma 1 and 2, it is clear to see that Theorem holds true.

3 Discussion and conclusion

Our results in some extent are a significant expansion of previously excellent achievements, meanwhile can be viewed as a perfect theoretical integration. Based on rigorous mathematical derivation, one should be convinced that now the scale-free feature and the Zipf-law communicate with one another in complex systems. Although here reports our recent work, it is just a tip of the iceberg. We always firmly believe that there will be still more challenges and difficulties to be overcome before better understanding the considerable potential from the power-law both experimentally and theoretically.

Bibliography3

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] G.K. Zipf. Cambridge, Mass: Addison-Wesley Press (1949)
3[3] A.-L. Barabási, R. Albert. Science. 5439 (1999): 509-512