Allometric Scaling in Scientific Fields
Hongguang Dong, Menghui Li, Ru Liu, Chensheng Wu, Jinshan Wu

TL;DR
This study uncovers stable allometric scaling laws in scientific fields, linking outputs like publications and citations to the number of authors, and shows deviations can effectively rank subfields.
Contribution
It is the first comprehensive analysis revealing allometric scaling laws in scientific disciplines and their stability over time, providing a new metric for subfield ranking.
Findings
Scaling laws relate outputs and inputs to field size across disciplines.
Exponents of scaling laws are stable over years.
Deviations from scaling laws can rank subfields independently of size.
Abstract
Allometric scaling can reflect underlying mechanisms, dynamics and structures in complex systems; examples include typical scaling laws in biology, ecology and urban development. In this work, we study allometric scaling in scientific fields. By performing an analysis of the outputs/inputs of various scientific fields, including the numbers of publications, citations, and references, with respect to the number of authors, we find that in all fields that we have studied thus far, including physics, mathematics and economics, there are allometric scaling laws relating the outputs/inputs and the sizes of scientific fields. Furthermore, the exponents of the scaling relations have remained quite stable over the years. We also find that the deviations of individual subfields from the overall scaling laws are good indicators for ranking subfields independently of their sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Allometric Scaling in Scientific Fields
Hongguang Dong 1
Menghui Li 2111H. Dong and M. Li contribute equally
Ru Liu 2
Chensheng Wu2
Jinshan Wu3222Corresponding author: [email protected]
-
Higher Education Press, Beijing, 10000, P. R. China
-
Beijing Institute of Science and Technology Intelligence, Beijing, 100044, P.R China
-
School of Systems Science, Beijing Normal University, Beijing, 100875, P.R. China
Abstract
Allometric scaling can reflect underlying mechanisms, dynamics and structures in complex systems; examples include typical scaling laws in biology, ecology and urban development. In this work, we study allometric scaling in scientific fields. By performing an analysis of the outputs/inputs of various scientific fields, including the numbers of publications, citations, and references, with respect to the number of authors, we find that in all fields that we have studied thus far, including physics, mathematics and economics, there are allometric scaling laws relating the outputs/inputs and the sizes of scientific fields. Furthermore, the exponents of the scaling relations have remained quite stable over the years. We also find that the deviations of individual subfields from the overall scaling laws are good indicators for ranking subfields independently of their sizes.
Key Words: Allometric scaling law, Subject classification code, PACS, MSC, JEL,
Some scientific fields might have many scientists and generate many publications while some might have small amount of researchers but with disproportionably larger publications. Have you wonder ever what is the relationship between the number of scientists and the number of papers (also number of references and received citations etc.) in scientific fields, and furthermore, whether and how such a relation can be used to indicate developmental stages of scientific fields? This question attracted considerable attention (country1, ; country2, ; Innovation, ; Chinese, ). In general, scaling laws are helpful to answer the above questions. For example, it is found that scaling laws are common phenomenon in scientific fields, including power law correlations between number of papers and of received citations (country1, ; country2, ; Innovation, ) as well as between economic indicators and bibliometric measures Chinese . In addition, a scale-independent indicator has been presented to evaluate the research performance country2 . These studies help us to understand the performance of research units in terms of universities, cities and countries etc. In this study, we present a scaling analysis between size, which is measured by the number of authors, and input/output, where the former is represented by the number of references and the latter refers to number of papers and of received citations, of subfields at various levels. In a sense, we treat subfields as the universities, cities and countries in previous studies country1 ; country2 ; Innovation ; Chinese .
The formation, flourishing and decline of scientific fields, like the formation, rise and fall of cities, countries and industrial sectors Growth ; interaction , certainly comprise a question worth studying, although of a more abstract nature since the boundaries between scientific fields are less well defined than those of, e.g., cities. Several researchers have begun to study the dynamic evolution of scientific fields evolution ; modeling or to evaluate the scientific performances of universities university1 ; university2 , research groups group1 ; group2 ; group3 and metropolitan areas Metropolitan .
We note that in studies of cities, countries and many other systems, analyses of the scaling between their outputs/inputs and their sizes play an important and unique role. It has been found that patterns described by scaling laws commonly arise in complex systems; examples include typical scaling laws in biology, ecology and urban development allometry ; theory . These patterns facilitate the understanding of underlying mechanisms, dynamics and structures common to all cities and creatures theory . In biology, Kleiber’s law describes the scaling relation between metabolism and body size Biology1 ; Biology2 ; Biology3 . The quarter-power exponents governing this relation originate from the physical and geometric properties of the underlying resources and information distribution network structures Biology5 ; Biology6 . In addition to metabolism, the relations between important biological variables (e.g., heart beat frequency, life span, and fertility rate) and body size also follow power laws Biology4 . In social systems, scaling relations are observed between urban indicators and city size Invention ; Growth ; Settlement ; Supply ; urban ; road ; countries . Bettencourt et al. systematically studied the scaling relations between various properties of cities, including inventors, total wages, GDP, total housing, knowledge production and population theory ; Invention . All of these properties can be classified into three groups according to their scaling exponents theory ; Invention ; Growth ; Settlement ; Supply ; urban ; road ; countries ; Metropolitan : economic, knowledge and innovation properties follow superlinear relations (increasing returns); household properties follow linear relations (individual human needs); and energy consumption and infrastructure properties follow sublinear relations (economies of scale) Growth . These results reveal a universal social dynamic underlying all these phenomena. For example, in the growth or stable stage of a city, it is important to have a superlinear relation between output and size and a linear or sublinear relation between input and size. One can imagine that if this is the case, then there must be some kind of mechanism to guarantee that such scaling laws are achieved. Therefore, scaling laws for the quantities of such systems imply the existence of such a mechanism and, consequently, the existence of essential interactions among the entities of the systems interaction . Thus, such scaling laws require further investigation. In addition, scaling laws for cities provide a baseline for assessing the stages of development of specific cities independently of their population sizes. Therefore, an understanding of the scaling laws for cities can help policymakers to enhance the performance of their city relative to this baseline interaction . In addition to studies treating cities as the targets of interest, studies have also been conducted on similar scaling relations between the outputs/inputs and sizes of metropolitan areas Metropolitan , universities university1 ; university2 and countries country1 ; country2 . In principle, the values of the exponents in such scaling laws for various cities at various times might also be informative regarding, for example, whether more or less increasing returns are achieved with the scientific, technological and political development of civilization. However, such an analysis would require an enormous amount of historical data.
In this work, we perform an allometric scaling analysis of scientific fields. In other words, we treat scientific fields as the targets of interest, analogously to cities or biological/ecological systems, and study the scaling relations between the outputs/inputs and sizes of the investigated scientific fields. First, we have decades of reliable data that we can use to investigate the above question of co-evolution. Second, since scientific progress is less well known to the general public compared with the expansion or recession of cities, and also because science is more abstract than physical/financial measures of cities, we believe that such a study on scaling in scientific fields should be of particular interest. Compared with the degree of awareness regarding the driving forces and mechanisms of urban development, we believe that the driving forces and mechanisms of the development of scientific fields are even less well understood.
Similarly to the way in which the quantities related to individual cities are studied in allometric scaling analyses of cities, here, we study the quantities related to individual subfields of several disciplines, namely physics, mathematics and economics. In these three disciplines, a large number of papers have been classified into various subfields according to well-established subject classification schemes for each discipline. By using data for each paper, including its subject labels, authors and references as well as the other papers that cite it (its received citations), we first wish to examine whether such allometric scaling relations exist between the outputs/inputs and sizes of these scientific fields. Here, for a given subfield, we regard the number of authors who have produced papers in our datasets as its size, the number of references as its input, and number of papers or received citations as its output. Of course, the implied goal and logic are the same as those in studies of such relations for, e.g., cities, metropolitan areas, countries and universities: if allometric scaling relations exist, then mechanisms that give rise to such scaling relations also exist, which require further investigation. Second, if such scaling relations exist, then we also wish to infer the relative position or developmental stage of each subfield by examining the deviations of the subfields from the overall scaling relations and by investigating the evolution of the scaling exponents.
In this work, we show that the scientific organization and dynamics that relate the division of labor to scientific development and knowledge creation are very general and manifest as nontrivial quantitative patterns common to all subfields. We present an extensive body of empirical evidence showing that the outputs and inputs are scaling functions of subfield size that are quantitatively consistent across different disciplines and times. As shown later, we find a weak superlinear relation between output and size for these subfields. This indicates that the three disciplines in fact show similar productivity. In addition, we show that to a certain degree, the developmental stages of individual subfields as inferred from their deviations from the values expected according to scaling law are reasonable. We also find a weak superlinear relation between input and size. Furthermore, we find that during the few decades for which we have data, the values of the exponents have remained quite stable. Although it is commonly believed that science, as measured simply in terms of the number of papers or other indicators, has been developing much more rapidly in recent decades and possibly even in recent years, it seems that the underlying mechanism has remained the same, and consequently, the exponents relating output/input and size have also stayed the same.
I Data and method
Datasets.
In our datasets, each paper is represented by a data entry that includes the year of publication, the subject classification code, and the numbers of author(s), reference(s) and citations as recorded in the Web of Science. We use the established subject classification scheme for each discipline to identify the subfields to which each paper belongs. The classification schemes used in this work are the Physics and Astronomy Classification Scheme (PACS) for physics, the Mathematics Subject Classification (MSC) for mathematics and the Journal of Economic Literature (JEL) codes for economics. These schemes are all hierarchical, and in this work, we use the fourth level of the physics classification scheme (e.g., 03.67), the third level of the mathematics classification scheme (e.g., 92B) and the second level of the economics classification scheme (e.g., N3).
The physics dataset is a collection of all papers published by the American Physical Society (APS) Physical Review journals from to . Here, we consider only those research papers, e.g., articles, brief reports and rapid communications, with PACS numbers. In total, the dataset includes papers, PACS numbers and classification labels.
The mathematics dataset is a collection of papers published from to and classified using the 2010 Mathematics Subject Classification. Here, we consider only those journal papers that have entries in both Mathematical Reviews and Web of Science. The MSC codes were obtained from the Mathematical Reviews records, and the other information was obtained from Web of Science. This dataset includes papers, MSC codes and classification labels.
The economics dataset is a collection of all economics papers collected by the American Economic Association Journal of Economic Literature from to . Here, we consider only those papers that have records with both the JEL and Web of Science. The JEL Classification Codes were obtained from the Journal of Economic Literature, and the other information was obtained from Web of Science. This dataset includes papers, JEL codes and classification labels.
Scaling laws.
An allometric scaling-law relation between one quantity and another quantity is assumed to have the form
[TABLE]
denotes an output (such as the number of papers) or input (such as the number of references) of a subfield , and denotes the size of that subfield (such as the number of authors). is a normalization constant. is the exponent, which we obtain through an ordinary least-squares (OLS) regression in log-log coordinates. The goodness of the regression is measured in terms of the coefficient of determination , where is calculated as the correlation coefficient between and . This OLS analysis can be applied to the all-year data, in which case the values of and are taken to be the cumulative values up through the last year covered by each dataset, or to single-year data. In the latter case, the exponent for year is denoted by .
Scaling laws, which follow a power-law function, describe the relation between two variables. In scaling laws, the scaling exponents are generally obtained by OLS regression Growth ; Invention ; interaction ; road ; Metropolitan ; innovation1 . However, scaling laws are different from the recent interest in power laws, which generally describe probability distributions , e.g. the distribution of citations powerlaw ; Innovation . In power laws, due to the fact that to be a normalizable probability distribution function the power law usually holds only at the tail part of the distribution function and also due to noises in rare events at the very end of the tail part so that sometimes a cut-off has to be introduced, the exponents can be better estimated by the maximum likelihood method maxhood ; maxhood1 ; maxhood2 than OLS regression. When there is a scaling law between two variables and when one of then two variables follows a power-law distribution, then clearly so does the other. Therefore, often scaling laws and power laws often appear together. However, this is not the case here in our analysis.
The relative stage of development of a subfield is measured in terms of the following deviation of the empirical value for that subfield with respect to the value predicted according to the allometric scaling relation:
[TABLE]
It is independent of the absolute size of the subfield.
Author name disambiguation.
In the following, we investigate the possible scaling relationships between the number of papers and the numbers of author instances and authors, where the former simply counts the number of authors among all papers in a field regardless of whether some papers have the same or overlapping authors, whereas the latter counts only all unique authors. For the latter, we must address the problem of author disambiguation. In this paper, we adopt the simple last full and all initials method to identify author names Authorname , in which authors who have the same last name and all the same initials are considered to be the same author. For example, A Smith, AB Smith and AC Smith would be identified as distinct authors, but Alice Smith and Alysia Smith will be regarded as the same author.
The all initials method has been claimed to have relatively low “contamination” rates in certain disciplines, such as in mathematics and in economics Authorname . We also performed our own small-scale validation of this approach. In the physics dataset, the subfield 42.50.Dv (Nonclassical states of the electromagnetic field) contains author-paper pairs. A total of distinct scientists were found after the disambiguation process. To validate the all initials method, we randomly selected 200 pairs of authors with similar names, each consisting of two papers considered to be from the same author. We then verified whether they were indeed the same person by performing a search on the APS website and the authors’ research homepages. We found the false positive rate (i.e., the number of authors considered to be the same person whereas, in reality, they are not) to be . We also performed a manual examination of the false negative rate (i.e., the number of identical authors incorrectly identified as different individuals using the all initials method) and found it to be approximately .
II Results
First, let us consider the relation between the number of papers and the number of authors, which, in a sense, is similar to the relation between the output and size of cities. Here, we use the cumulative data up through the last year covered by the dataset for all three disciplines. We see that the values of the exponent are () for physics (Fig. 1(a)), () for mathematics (Fig. 1(c)) and () for economics (Fig. 1(d)). This means that the number of papers per author very weakly increases as the number of authors increases. This, in turn, indicates that there are marginal increasing returns in physics, mathematics and economics.
When we consider Fig. 1(a) in further detail, we note that some subfields (29.20.xx) of physics are much less productive than predicted by the scaling law. For example, subfield 29.20.xx (Storage rings and colliders) is related to high-energy experiments and has approximately authors per article on average. Such experimental subfields in physics generally require many scientists to work together. This might make the scaling exponent systemically smaller. To exclude these subfields, we restrict the analysis only to papers with at most ten authors (denoted by physics (subsets)). As shown in Fig. 1(b), with this approach, the scaling exponent becomes (). For cities, the scaling exponent between the number of new patents and the urban population is , and that between the number of inventors and the urban population is Growth . Therefore, we can roughly estimate the scaling exponent between the numbers of new patents and inventors to be approximately . This rough estimation shows that our results are qualitatively consistent with the relation between the numbers of patents and inventors, which, in a sense, is similar to the relation between the numbers of papers and authors, as deduced from studies of scaling relations in cities Growth . However, the exponent values of () in physics, in mathematics and in economics for the development of science/patents are quite different from the exponent relating the output and size of cities, which is roughly . This means that the effect of increasing returns in science/patents is only marginal and not as high as the effect seen for production processes in cities. We do not know the reason for this difference. We can only speculate that it may be more difficult to increase scientific output than it is to increase industrial production by simply expanding in size.
Next, let us check whether the scaling exponents have remained stable during all investigated years of development of the fields by performing a scaling analysis on the single-year data for each year. We know that the average numbers of authors and references in papers today are much larger than those in earlier times. However, we find that except for physics, for which the value is smaller than for the other fields and slightly decreasing, the values of the scaling exponent have remained quite stable, as shown in Fig. 2, especially for physics (subsets). The fact that similar scaling laws are observed in various disciplines implies that there might be a common mechanism governing the scientific progress of these disciplines, and the fact that the exponent values have remained similar and stable over time indicates that the underlying mechanism, if there is such a mechanism, tends to be preserved over time. The fact that physics as a whole shows a smaller and slightly decreasing exponent and the fact that we know that this phenomenon is due to papers with more than authors, which are often related to high-energy experimental physics, suggest that physics might have developed to a stage in which it often requires large teams to solve certain difficult problems and thus is less productive. It should be noted that the exponents in the yearly data analysis are smaller than the exponents for the cumulative data for reasons that we do not yet know.
Let us also compare this relation with the relation between the number of papers and the number of author instances (each appearance of an author, including duplicate authors, increments the total number of author instances by ). Interestingly, in this case, it is found that all exponents are smaller than , with () for physics (Fig. 1(e)), () for physics (subsets) (Fig. 1(f)), () for mathematics (Fig. 1(g)) and () for economics (Fig. 1(h)). This means that the marginal effect of increasing returns previously observed in Fig. 1(a-d) disappears when the number of author instances is considered, and the number of papers per author instance decreases as the number of author instances increases.
Although the goodness of fit of the fitted curves are very high overall, there are some outliers that are relatively far from the fitted curves in the above figures, and the relative positions of the subfields often change from year to year. The residual is a measure of the deviation of a true value from the corresponding value predicted by the scaling law (Eq. (2)). These deviations provide a meaningful way to rank cities interaction and universities university2 . In Fig. 3, we show the ranking of the deviations by magnitude and sign for physics and economics in 2013 as well as those for mathematics in 2010. Let us focus on a few subfields that deviate strongly and positively from the scaling law. For example, the output of classical general relativity (04.20) is ranked 3rd in physics in Fig. 3(a) according to its deviation, but it is a relatively small subfield (ranked 200th by size). Quantum information (03.67) ranks 4th in physics in Fig. 3(a) but is ranked 49th according to its size. When they are ranked according to their sizes, classical general relativity is not ranked similarly to quantum information. However, when they are ranked according to their deviations, we see that they are both among the top subfields in physics. These findings are broadly consistent with our intuition regarding these subfields: one is small and one is big, but both are very active subfields. This implies that, at least in part, the deviation from the fitted scaling law provides a reasonable indicator of the ranking of the subfields that is independent of their sizes.
We also rank the subfields of mathematics and economics. It is found that topological geometry (51H) is ranked 1st in mathematics according to its deviation, whereas it is ranked 632nd according to its size. In addition, it is found that game theory and bargaining theory (C7) is the top subfield in economics according to its deviation but is ranked 44th according to its size. Judging from our limited knowledge of economics, we believe that it is reasonable for game theory to be considered among the top subfields: it is not large but is a core subfield of economics, which can partially be seen from the fact that (according to Wikipedia) there have been game theorists among the Nobel laureates in economics, and some economists even believe that it is the core of the whole of economic theory Levin:GameTheory .
Let us now look at other outputs of the investigated scientific fields vs. the numbers of authors in their subfields. It is found that the exponents relating the numbers of citations and authors are larger than , with () for physics, () for physics (subsets), () for mathematics and () for economics (Fig. 4). This means that authors working in larger subfields receive, on average, more citations than those in smaller subfields. In addition, the exponents relating the numbers of citations and papers are () for physics, () for physics (subsets), () for mathematics and () for economics. These findings are similar to the scaling laws between the numbers of citations and papers when universities university1 ; university2 and research groups group1 ; group2 ; group3 are treated as the relevant units. However, the exponent values for the latter cases are approximately , larger than those found here. This means that whereas authors are more likely to cite papers from the same university, the same research group and the same subfield, the degrees of affinity for universities and research groups are even stronger than those for subfields.
Next, we find that the scaling-law exponent relating the numbers of references and authors is smaller than the exponent between numbers of citations and authors. We have () for physics, () for physics (subsets), () for mathematics and () for economics (Fig. 5). In scaling law of cities, similarly the exponent of supplies and population is also smaller than the exponent of outputs and populationGrowth . We might expect these exponents in the case of scientific publications to be higher since, intentionally or unintentionally, people may cite references more carelessly than they would use living supplies because there is no cost for citing more references, whereas there is a cost associated with the use of living supplies. However, the fact that these exponents are close to, although clearly slightly higher than, that relating the housing/water/energy supplies and populations in cities implies that perhaps researchers do not cite many unnecessary references.
III Conclusions and Discussion
In this paper, we first examined and confirmed the allometric scaling relations between the numbers of papers, citations, and references and numbers of authors in subfields of three disciplines, namely, physics, mathematics and economics, which are analogous to the relation between the numbers of patents and inventors of patents in cities Growth and the relations between various outputs/inputs and population size for cities Growth and countries countries . One of the reasons for the development of cities is that there is an effect of increasing returns between the output and size of a city, which results in a lower effective cost for intra-city transactions than for inter-city transactions. Perhaps there are similar factors driving the formation of scientific subfields, which cause the observed allometric scaling relations to arise in the development of research subfields. Furthermore, the values of the exponents for all three disciplines were found to be similar and to have remained stable over time. We do not yet know why the various disciplines display similar exponents and temporal stability. However, we believe that this common allometric law across disciplines and time requires further investigation: certain common underlying mechanisms may exist that drive the development of scientific fields in various disciplines.
We found that the exponents relating the numbers of papers and authors are much smaller than those relating the various outputs of cities to their size Growth . This means that the effect of increasing returns observed in scientific production is much lower than the corresponding effect on production in cities. However, the exponents relating the numbers of citations and authors are more similar to those relating the various outputs of cities to their size, indicating that there is a stronger effect of increasing returns between the numbers of citations and authors. This suggests that on average, as the number of authors increases, there is only a marginal effect of increasing returns on the number of papers but a much larger effect of increasing returns on the number of citations. In addition, through several examples, we showed that deviations of individual subfields from the predictions of the allometric scaling relations can provide a size-independent but still meaningful ranking of those subfields.
The current study has several limitations. Our datasets contained only the portions of WOS that overlap with the relevant subject classification schemes (PACS, MSC and JEL), which restricted our ability to study the scaling relations governing the properties of all publications. In particular, the results for physics consider only those papers published in Physical Review journals. Moreover, the method used for author name disambiguation could be further improved.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) J. S. Katz, The self-similar science system. Res Policy 28: 501-517 (1999).
- 2(2) Scale-independent indicators and research evaluation. Sci Public Policy 27: 23-36 (2000).
- 3(3) J. S. Katz, What is a complex innovation system? PLOS ONE 11(6):e 0156150 (2016).
- 4(4) X. Gao, J. Guan, A scale-independent analysis of the performance of the chinese innovation system. Journal of Informetrics 3:321 C 331 (2009).
- 5(5) L. M. A. Bettencourt, J. Lobo, D. Helbing, C. Kuhnert, G. West, Growth, innovation, scaling, and the pace of life in cities, Proceedings of the National Academy of Sciences of the United States of America 104 7301-7306 (2007).
- 6(6) L.M.A. Bettencourt, J. Lobo, D. Strumsky, G. B. West, Urban scaling and its deviations: revealing the structure of wealth, innovation and crime across cities. P Lo S One 5: e 13541 (2010).
- 7(7) M. Herrera, D. C. Roberts, N. Gulbahce, Mapping the evolution of scientific fields. P Lo S ONE 5:e 10, 355 (2010).
- 8(8) L. M. A. Bettencourt, D. I. Kaiser, J. Kaur, C. Castillo-Chavez, D. E. Wojick, Population modeling of the emergence and development of scientific fields. Scientometrics 75(3):495-518 (2008).
