Distribution in the Geometrically Growing System and Its Evolution
Kim Chol-jun

TL;DR
This paper presents a theory for geometrically growing systems that explains power-law distributions and their evolution, highlighting unique convexity features and the system's tendency to flatten over time, without relying on complex economic models.
Contribution
The paper introduces a new statistical theory for geometrically growing systems that accounts for power-law phenomena and their evolution, including a convexity feature absent in traditional models.
Findings
Explains power-law distributions in demographic, economic, and pandemic data.
Identifies convexity in the low-size distribution part.
Shows the distribution tends to flatten as the system evolves.
Abstract
Recently, we developed a theory of a geometrically growing system. Here we show that the theory can explain some phenomena of power-law distribution including classical demographic and economic and novel pandemic instances, without introduction of delicate economic models but only on the statistical way. A convexity in the low-size part of the distribution is one peculiarity of the theory, which is absent in the power-law distribution. We found that the distribution of the geometrically growing system could have a trend to flatten in the evolution of the system so that the relative ratio of size within the system increases. The system can act as a reverse machine to covert a diffusion in parametric space to a concentration in the size distribution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Time Series Analysis · Complex Network Analysis Techniques · Opinion Dynamics and Social Influence
Distribution in the Geometrically Growing System and Its Evolution
Kim Chol-jun
*Department of Astronomy, Faculty of Physics, Kim Il Sung University, DPR Korea
postal code:+850
email address: [email protected] *
Abstract
Recently, we developed a theory of a geometrically growing system. Here we show that the theory can explain some phenomena of power-law distribution including classical demographic and economic and novel pandemic instances, without introduction of delicate economic models but only on the statistical way. A convexity in the low-size part of the distribution is one peculiarity of the theory, which is absent in the power-law distribution. We found that the distribution of the geometrically growing system could have a trend to flatten in the evolution of the system so that the relative ratio of size within the system increases. The system can act as a reverse machine to covert a diffusion in parametric space to a concentration in the size distribution.
keywords:
power-law; firm size distribution; the COVID-19 pandemic ; population in city; spectral hardening;
JEL code:
C11; O1
Significance: Most economic systems that seem to show the power law distribution are analyzed by Gibrat’s model, alias a geometrically growing system, which seems to give the log-normal distribution. We showed that the system can give an asymptotic power law if the correlation between parameters is considered. In this paper we show the system can lead to the spectral hardening provided the diffusion, or the increment of variances, along with the growth of the system.
1 Introduction
First, we explain the problem and some terminology. A system is composed of members and we call each member an item. The item has a measurable property, which is called a size. The population in city and firm size can be regarded as sizes while city and firm stand for item within country, which is in turn the system. The power law, alias Zipf’s law or the Pareto distribution, states that the probability of an item is inversely proportional of a power of the size of the item: , where stands for the frequency of item of size , for exponent of power and for the normalization constant.
Historically, Pareto (1896) showed that the distribution for income follows the power law. Estoup (1916) and Zipf (1932) observed the power law in word frequency in a novel and Auerbach (1913) and Zipf (1949) indicated the law for the population size of city. Much diverse things show the power-law distribution, for reviewing which we can refer to many works (e.g. see Mitzenmacher, 2004; Newman, 2005). In fact, the author was interested in the cosmic ray spectrum, which seems a typical power-law. Salpeter (1955) had found that the mass distribution of stars follows the power law.
Several generative models for the power-law distribution are proposed, which we can categorize into some groups. The first models are based on a preferential attachment or “rich-get-richer” process (Yule, 1924; Simon, 1955; Barabasi & Albert, 1999). The second ones pursue the scale invariance, which is a peculiarity of the power law distribution (Bak, Tang & Wiesenfeld, 1987; Sneppen et al., 1995). The third ones begin with demanded optimization (Mandelbrot, 1953). And others composite models derive the power-law distribution from specially assumed elementary distributions of the parameters (Gibrat, 1931; Miller, 1957; Gabaix, 1999; Reed & Jorgensen, 2004). Those models show many possibilities generating the power-law distribution. However, all those are based on special assumptions. Though a postulation is the start of logic, but it should better have generality. And the logic should better cover wider range of size in data.
2 The formalism for the distribution in the geometrically growing system
Recently, we developed a theory of a geometrically growing system (GGS) (Chol-jun, 2022) on the basis of statistically maximally plausible assumptions, i.e. the normality of distribution of parameters. If the size of each item in an system grows geometrically or proportionately, we call the system geometrically growing. A GGS can be modeled by
[TABLE]
where is the size of an item in system, is the growth rate (hereafter simply, growth), stands for the the age of growth111In Chol-jun (2022) the age was denoted by . and is an initial size of the item.
We can assumed the normal distribution for not only but also , which is statistically maximally plausible. Here we can introduce a correlation between and without loss of generality because the correlation can be given even in completely arbitrary configuration of and .222 The instance of the COVID-19 pandemic in Chol-jun (2022) showed a systematic correlation: the countries that had later outbreak of the pandemic show relatively lower growth, i.e. the positive correlation is obtained, which might be because they could have a warning or preparation. Gabaix & Ioannides (2004) indicated that in some decades large cities grow faster, but in other decades small cities grow faster. This implies the sign of the correlation between the age (assuming that older cities are greater) and the growth flips over casually. If the correlation is positive (), then the log-size ()at the upper limit can be approximate by
[TABLE]
while if the correlation is zero or negative (), it is approximated by333 and are interchanged in comparison with Chol-jun (2022).
[TABLE]
where are variables following the standard normal distribution and, if and stand for the means and standard deviations of and , the parameters are given as follows:
[TABLE]
[TABLE]
where
[TABLE]
and stands for the sign of and for the absolute value.
Thus, if , the log-size behaves such as a variable while for the case of , as a normal variable. We can derive the probability density function (PDF) of size for both case:
[TABLE]
We call the log-completely squared chi () distribution with 1 degree of freedom (shortly, log-CS or log-CS) while is well-known log-normal distribution. What is interesting is that the asymptotic exponent, i.e. the asymptotic slope in log-log scale diagram of the PDF, of the log-CS tends toward a constant:
[TABLE]
which says that the log-CS has asymptotic power-law behavior. Especially, the asymptotic exponent depends only on the variances of the age and the growth. By the way, the exponent is negative so that we usually consider only its absolute value.
For the log-normal distribution, local slope is determined by the variance (more exactly, the standard deviation) , which in turn depends on the means and variances of the parameters.
3 Statistics of the COVID-19 pandemic and its evolution
The propagation of the pandemic can be considered as a typical example of the geometrically growing system. In spite of seasonal rise and falls, the lock-down measures, administration of vaccines and appearance of variants, the propagation of the pandemic has been accelerated over 2 years since outbreak. The power-law distribution of infected in countries were reported in the early stage when the pandemic was propagating between countries (Blasius, 2020; Beare & Toda, 2020). However, once propagation between countries had been saturated, the distribution should be deviated from the power-law.
Chol-jun (2022) showed that the distribution of accumulated infected in countries in May 2021 could be approximated by the log-CS excellently. Note that the approximation is not a best-fitting but derived from the history of the pandemic. In fact, distributions of the age and growth are similar to the normal and their correlation turned out to be positive. Figure 1 shows the consistency between observation and the log-CS approximation at several stages of the pandemic: as examples, in late February 2020 (the early stage), late July 2020 (the saturation of propagation between countries) and early February 2022 (the propagation of the Omicron variant).444 Data for COVID-19 propagation is available, for example, at the website of Our World in Data https://ourworldindata.org/coronavirus/. Especially for July 2020, the observation curve is weaving around the log-CS approximation. Distribution in early February 2022 seems to be getting distorted by the unprecedented quick propagation of the Omicron variant.
In the above count and probability histograms we could hardly sense a change of the slope of the probability curve. If we try to use a maximum likelihood (ML) estimation of the power-law exponent (Newman, 2005)
[TABLE]
where stands for the size for data items and for the number of the items, we should set , i.e. the lowest allowable size or truncation size. However, the distribution is not the power-low over all domain of size but only in the tail part of big size. We could set as the modal (most probable) size in the probability histogram (Fig. 1) and determine the tail exponent. Figure 1 shows that the tail exponent appears to decrease over all the past time in spite of local rises.
We could propose another proxy for the tail exponent: the variance of the log-normal distribution. In fact, the distribution can be approximated by the log-normal (Eq. 8) as well. This distribution has not an asymptotic exponent and local slope is determined by the variance (Eq. 5). We can estimate this variance also in the maximum likelihood approach:
[TABLE]
This approach has advantage over the above evaluation of the tail exponent because the selection of an optimal truncation size or does not matter. In both the log-CS and log-normal distributions the slope increases for bigger size so that the tail exponent could be evaluated greater for greater size and vice versa even though the distribution still remains the same. The greater variance corresponds to the smaller local slope or the tail exponent. In the evolution of the COVID-19 pandemic, the variance seems to increase (Fig. 1), which is coincident with decreasing the tail exponent as aforementioned.
4 Statistics for population in city and in country
The distribution of population in city is a classical instance of the power-law. The growth of population over centuries shows an exponential or geometrical profile though sometimes was so saturated that expressed by the logistic function. Therefore, we can express the evolution of population by the geometrically growing system (GGS).
First, we analyzed the population in city within a country using data in stellarium-0.21.1.555We use the dataset for cities compiled as observation locations on the globe in stellarium-0.21.1, which is an open-source astronomical software. The software is available at Stellarium Github webpage https://github.com/stellarium/stellarium/releases/. The dataset covers 24,000 cities with their location, population and other information gathered between 2006 and 2019. Because of discontinuity in lower population, we limit cities to the population over 20,000. Analyzing the population in cities for U.S., Gabaix (1999) indicated that the populations in biggest cities follow the power-law distribution and Zipf exponent is almost unity. We obtained Zipf exponent 1.29 for U.S. (Fig. 2), which differs from 1 according to Gabaix (1999) and 0.639 for Iraq, for example. We perform the best-fitting analysis for population in cities of U.S. with various approximations: the log-CS, the log-normal and power-law (Fig. 2). We infer the best-fit parameters by a Markov Chain Monte Carlo (MCMC) method, especially making use of the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970). with best-fit parameters is evaluated: for the log-CS, for the power-law and for the log-normal approximation. Therefore, we can prefer the log-CS as the closest approximation in considering population in city within country. Ioannides & Skouras (2013) claimed that most cities in the U.S. obeys a log-normal, but the upper tail and therefore most of the population obeys a power-law. In fact, the log-CS and log-normal distributions are almost indistinguishable except for in the infinity (Chol-jun, 2022).
Next, we perform the best-fitting analysis for the distribution of population in countries and areas over the world, making use of World Population Prospects (WPP) 2015 dataset666The data is available at the website of World Population Prospects https://population.un.org/wpp/. with the approximations (Fig. 2). With the best-fit parameters inferred by the MCMC method, is evaluated: for the log-CS, for the power-law and for the log-normal approximation. Therefore, in this case we can prefer the log-normal as the closest approximation.
On the other hand, we try to apply our approach of the GSS. We expect the correlation between the age and the growth. For that, we have inspected the age of countries in the Korean Great Encyclopedia777the Korean Great Encyclopedia, Pyongyang, Encyclopedia Press, 2001. We date the starting epoch of country by appearance of the first administration, e.g. the first dynasty or city-state, the independence and so on. However, for many African and American countries, we should consider that the establishment of the colony had changed greatly the composition of population in those countries. We are afraid that we might take missed or distorted official record of real history for many countries and areas in extracting typical dates. The growth rate is so evaluated for each country that the population was originated from a couple of Adam and Eve, which approximation was applied to COVID-19 pandemic in Chol-jun (2022). Surprisingly, the growth rate and age show a exactly inverse relation, furthermore its exponent is almost unity (Fig. 3). This gives a negative correlation between the age and growth and, of course, we could expect that the world population should follow the log-normal distribution.
Also from Britanica888Encyclopdia Britannica Ultimate. Reference Suite. Chicago: Encyclopdia Britannica, 2014. we extracted the date of the first habitation of the tribe or immigration. This kind of age that could be called “habitation age” are much older than the previous “administration age.” But the negative correlation is obtained still (Fig. 3). We also considered another kind of age which is extrapolated from the current growth of population: the origin of age is set so that the initial population was also a couple. We call such an age “extrapolation age.” Figure 3 shows relation between the age and the growth those are obtained by the extrapolation from the period 1950-2015 of WPP dataset. Also a negative correlation is given. And, for any kind of age, the inverse relation between the age and the growth still holds and their exponent are near unity.
We analyze this fact. From Eq. (1) we can derive
[TABLE]
If and is determined to extent of times, then or are also determined to extent of times and in log-log diagram of vs. appears a stripe of width (Fig. 3). This stripe has slope of surely so that a negative correlation between and is obtained. In order that a positive correlation gets, this stripe must cover all the range of in dataset. Therefore, it must hold that , which we can rewrite from Eq. (12):
[TABLE]
where and could be appropriate extremes of the dataset, e.g. of region. This might be a necessary condition for positive correlation between and . To summarize, we can claim a theorem.
Theorem 1**.**
The geometrically growing system can have positive correlation between the growth and the age only if Eq. (13) satisfies.
From the theorem we could derive another conclusion:
Corollary 1**.**
If new items with the lowest size are continuously born within the geometrically growing system, the system should be approximated by the log-normal. On the other hand, the system could be approximated by the log-CS after the creation of new items with lowest size has been stopped.
For the countries over the world, if we take an initial condition (a couple) and consider that the maximum and minimum population in countries are now and and , then we obtain in r.h.s and only in l.h.s. of Eq. (13). However, if we would consider a non-constant initial condition (in real circumstance), we might gain so much greater value of l.h.s that we could perform log-CS approximation. For the case of the COVID-19 pandemic, and , so we can apply the log-CS soon after saturation of propagation between countries. Note that Eq. (13) is not a sufficient condition.
We inspect the evolution of the population distribution over the world. WPP 2015 dataset provides the population in countries and areas in time 1950-2015. As aforementioned, the upper slope of the distribution can be evaluated by both methods: the tail exponent and the variance in the log-normal approximation. Figures 2 and 2 show the similar trend of the flattening slope as in case of the COVID-19 pandemic. This means that the variance of the parameters such as the age and the growth is so growing that the variance of the size distribution grows, and the tail exponent decreases.
The tail exponent for population in city within a country also evolves. Citing previous works, Gabaix & Ioannides (2004) indicated that the tail exponent for the U.S. decreased in the period from 1900 to 1990 to imply a greater concentration. Gonzáles-Val (2010) assured a monotonically decreasing of the tail exponent with time, provided the truncation number of cities keeps as 10,000. Interestingly, observing data for dynamics of cities in the central and eastern Europe (CEE) countries during 1970-2007 (Necula et al., 2010), we can find that in most countries the exponent has almost a negative relation with the population itself: if the population increases, the exponent decreases and vice versa (Fig. 4). The exponent for European cities in the middle ages seems to decrease after 1500 (Bairoch, Batou & Chèvre, 1987; Gonzáles-Val, 2019), and only then Zipf’s law was reported to emerge for cities in Europe (Dittmar, 2011).
Considering countries, quite different with respect to wealth, size and geography, Pinto, Lopes & Machado (2012) claimed that the countries presenting higher wealth levelsreveal higher values of the exponent while most African countries unveil smaller values of the exponent. In our simulation, oldest Asian countries such as Iraq and China appear to have smallest exponent. However, their reasoning was so not obvious: if it was right, the exponent over the world should increase as the world economy proceeds but the exponent surely is decreasing. Our approach can give an obvious reasoning: younger countries might have greater exponent and vice versa. In fact, the most more developed countries are younger while the older countries are underdeveloped so that more wealthy countries could appear to have higher exponent, which, however, is casual but not inevitable.
Gabaix & Ioannides (2004) related the urbanization with the economic factor, e.g. the economic integration and the international trade. Then why is the exponent increasing in some countries in spite of economic progress? Necula et al. (2010) proposed political factor to determine the urbanization. We could give a statistical analysis ahead of or including all the economic or political or any other factors. For example, the exponent depends on the variances of the parameters.
5 Statistics of firm size
The power-law distribution has appeared widely in economic and financial phenomena (e.g. see Farmer & Geanakoplos, 2008). The power-law distribution in firm size that could be measured by diverse properties have anounced long ago (Zipf, 1949; Ijiri & Simon, 1977). Making use of Economic Census 1997, Axtell (2001) showed the power-law distribution in firm size measured by employees and revenue. The firm size is a quantity which could be apt to grow geometrically. In fact, we commonly evaluate the growth of firm in terms of proportionality but not additivity.
We analyse the data in Axtell (2001), where numerical data for the size of firm expressed by the number of employees (the employment size) were shown explicitly. In fact, the distribution has a convex form in low-size part, which is in more favor of the log-CS rather than the pure power-law modeling. Giovanni, Levchenko & Rancière (2010) analyzed French firms, and obtained a similar convex profile of distribution. We neglect 0-size firms as Axtell did. We perform the best-fitting with the various approximations by the MCMC method (Fig. 5). For the power-law fitting is obtained the same as Axtell: . The log-CS fitting gives a greater value: . We should expect that the dataset could be approximated by the log-normal: . This says that the log-CS could be the closest to the real dataset.
As aforementioned, the geometrically growing system can be modeled alternatively by either log-CS and log-normal depending on the correlation between the growth and age. Data for employment dynamics by firm age, 1987-2005, from the Census Bureau Business Dynamics Statistic and Longitudinal Business Database, showed that young firms have higher employment growth rates, if they survive, than older firms (Haltiwanger, Jarmin & Miranda, 2009, 2010). This might be because the growth of older or greater firms seems to be saturated due to market limitation while this limitation does not affect younger firms so the latter appears to have higher growth in spite of higher establishment exit. Analyzing data from the EFIGE survey that sampled French, Italian and Spanish firms in the period from 2001 to 2008, Navaretti, Castellani & Pieri (2012) showed that younger firms have a highly probability of experiencing high growth rates both in the short-run (e.g. for 1-year) and in the long-run (i.e. for existing age). Therefore, we could expect a negative correlation between firm age and firm growth. This should lead to the log-normal fitting to the distribution, though the real dataset seems closer to the log-CS. This might be originated from non-normal distribution of the age and growth.
We trace the evolution of the distribution. We inspect data from the Census Bureau Business Dynamics Statistic (BDS), 1977-2014.999The data is available at website of Small Business Administration https://www.sba.gov/sites/default/files/advocacy/%bds_firm_size.xlsx Though the data shows a lowering of exponent in lower-size part in contrast to higher-size part which could stand for a convexity of the distribution (Fig. 5), we could evaluate the tail exponent by linear regression, excluding both the lowest- and highest-size bins, because the highest bin has inappropriate upper limit for infinity. Figure 5 shows clearly that the exponent decreases as time goes. Therefore, we can see that the tail exponent is evolving to lower, i.e. the distribution is flattening. We can give a reason in our approach as aforementioned: the variances of the age or the growth so increase that the variance of the size increases and the distribution of size flattens.
6 Conclusion and discussion
In this paper, we consider some special properties of distribution for the geometrically growing system (GSS) with pandemic, demographic and economic phenomena.
First, the distribution has a convexity in the lower-size part. It is not surprising, it represents only the modal (most probable) size, which is popular in almost distributions but absent in the power-law. In fact, the log-CS has additional concavity and singularity. In the above approximations the log-CS or log-normal, both or either, dominate over the power-law. This means that the demographic, pandemic and economic distributions can be explained by the GSS properly. In fact, difference between the log-CS and the log-normal is not great in most cases and not important.What matters is that both of them represent the distribution of GSS and the convexity appears commonly in both them. However, profile of those distributions may be changed if the distribution of parameters is deviated from the normal. For the log-CS or log-normal, there does never appear a divergence problem which happens for the pure power law with a certain exponent.
If new items are born with low size continuously flourishing, this convexity will get fainter and the distribution seems to be closer to the power-law. However, once the number of items in the system is saturated, the number of low-size items decreases in evolution, unless they are isolated from the ensemble and never grow, and a kind of roll-over in lower-size part grows. Until early stage of such a period, the distribution of GGS should be represented only by the log-normal while long after the saturation of the number of items, the log-CS fitting can become possible.
Second, while the parameters such as the age and the growth diffuse, the tail exponent lowers and the distribution is flattening in the evolution of system, which is often called the spectral hardening. As aforementioned, the slope of distribution depends on the variances of the parameters fully or partly. In many systems such as Brownian motion, the variance of parameters are growing with time. The diversity in economical actions and variance in economical growth are accelerating with the time. The second law of the thermodynamics dictates only that the matter should spread out by the diffusion. However, the matter is collecting and agglomerating over the universe. Though, in the physical view, it can be explained by the gravitation and so on, but, on the statistical way, the geometrically growing system can act as a reverse machine that converts the diffusion in parametric space to the concentration in the distribution of size.
The flattening distribution in turn implies the enlargement of the relative ratio in size between the highest- and lowest-size items within the system. This could explain the urbanization, monopolization and so on. The urbanization proceeded in ancient times such as in ancient Rome. The urbanization can occur not only by migration due to economic and political reasons, but also by the stochastic nature of the growth itself, e.g., by different birth (or death) rate or involving all the former factors. If we would apply this property to wealth distribution, which has geometrically growing trend and follows a power-law, we could expect aggravation of the “rich-get-richer” process and the monopolization in the economic regime by nature if money begets money.
It is interesting that the concentration might be compatible with or even driven by the diffusion. The growth in GSS could give rise to the diffusion in parametric space, which in turn leads to the centralization in matter space. The “rich-get-richer” phenomenon does never imply that the rank in system should be fixed, that is, the richest or biggest one could keep their first rank naturally. The rank could be determined by the growth rate, the diversity of which can be changed with time. The richest job or biggest city have been alternating with era, as we have seen.
We can find properties of the geometrically growing system in many other phenomena. We wish that our approach could contribute to analyze the problems.
Conflict of interest
The author has no conflicts to disclose.
Data availability
Data used in this paper are available at the website addresses indicated or by corresponding with the author.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Auerbach (1913) Auerbach, F. “Das Gesetz der Bevolkerungskonzentration,” Petermanns Geographische Mitteilungen LIX (1913), 73—76
- 2Axtell (2001) Axtell, R., Zipf Distribution of U.S. Firm Sizes, Science, 293, pp 1818-1820, 2001
- 3Bairoch, Batou & Chèvre (1987) Bairoch, P., J. Batou, and P. Chèvre. 1988. La Population Des Villes Européenes. Geneva: Droz.
- 4Bak, Tang & Wiesenfeld (1987) Bak, P., Tang, C., Wiesenfeld, K., Self-organized criticality: An explanation of the 1/f noise. Physical review letters, 59(4):381, 1987.
- 5Barabasi & Albert (1999) Barabasi, A., Albert, R., Emergence of scaling in random networks, Science, 286, 509-512 (1999)
- 6Beare & Toda (2020) Beare, B. K., Toda, A. A., On the emergence of a power law in the distribution of COVID-19 cases, Physica D 412:132649, ar Xiv:2004.12772 v 2 [physics.soc-ph] (2020)
- 7Blasius (2020) Blasius, B., Power-law distribution in the number of confirmed COVID-19 cases, Chaos, 30, 093123, https://doi.org/10.1063/5.0013031, ar Xiv:2004.00940 v 2 [q-bio.PE] (2020)
- 8Chol-jun (2022) Chol-jun, K. The power-law distribution in the geometrically growing system: Statistic of the COVID-19 pandemic, Chaos, 32, 013111 (2022), doi: 10.1063/5.0068220
