The development of nations conditions the disease space

Antonios Garas; Sophie Guthmuller; Athanasios Lapatinas

arXiv:1903.09703·physics.soc-ph·March 26, 2019

The development of nations conditions the disease space

Antonios Garas, Sophie Guthmuller, Athanasios Lapatinas

PDF

TL;DR

This paper introduces new metrics to analyze the relationship between economic development and disease complexity across countries, revealing that higher income levels are associated with more complex disease profiles.

Contribution

It develops two novel metrics for disease relatedness, linking economic development to disease complexity and providing a disease-level index related to GDP per capita.

Findings

01

Higher income per capita correlates with increased disease complexity.

02

The disease-level index shows more complex diseases are prevalent in wealthier countries.

03

Economic development influences the diversity and complexity of diseases in nations.

Abstract

Using the economic complexity methodology on data for disease prevalence in 195 countries during the period of 1990-2016, we propose two new metrics for quantifying the relatedness between diseases, or the `disease space' of countries. With these metrics, we analyze the geography of diseases and empirically investigate the effect of economic development on the health complexity of countries. We show that a higher income per capita increases the complexity of countries' diseases. Furthermore, we build a disease-level index that links a disease to the average level of GDP per capita of the countries that have prevalent cases of the disease. With this index, we highlight the link between economic development and the complexity of diseases and illustrate, at the disease-level, how increases in income per capita are associated with more complex diseases

Tables1

Table 1. Table 8: Disease-income complexity index and the complexity of diseases

	(1)	(2)
	DCI Within Estimation	DCI Between Estimation
DICI	0.479***	2.041***
	(0.149)	(0.045)
Observations	5,211	5,211
Diseases	193	193
R-square	0.90	0.88

Equations18

Φ_{i, j} = min {Pr (RDD_{i} \geq 1 ∣ RDD_{j} \geq 1), Pr (RDD_{j} \geq 1 ∣ RDD_{i} \geq 1)},

Φ_{i, j} = min {Pr (RDD_{i} \geq 1 ∣ RDD_{j} \geq 1), Pr (RDD_{j} \geq 1 ∣ RDD_{i} \geq 1)},

RDD_{c d} = \frac{\nicefrac X _{c d} \sum _{d^{'}} X _{c d^{'}}}{\nicefrac \sum _{c^{'}} X _{c^{'} d} \sum _{c^{'} d^{'}} X _{c^{'} d^{'}}},

RDD_{c d} = \frac{\nicefrac X _{c d} \sum _{d^{'}} X _{c d^{'}}}{\nicefrac \sum _{c^{'}} X _{c^{'} d} \sum _{c^{'} d^{'}} X _{c^{'} d^{'}}},

\tilde{M}_{c c^{'}} = \frac{1}{k _{c, 0}} p \sum \frac{M _{c d} M _{c^{'} d}}{k _{p, 0}},

\tilde{M}_{c c^{'}} = \frac{1}{k _{c, 0}} p \sum \frac{M _{c d} M _{c^{'} d}}{k _{p, 0}},

HCI = \frac{K - ⟨ K ⟩}{s t d ( K )} .

HCI = \frac{K - ⟨ K ⟩}{s t d ( K )} .

\tilde{M}_{d d^{'}} = \frac{1}{k _{d, 0}} c \sum \frac{M _{c d} M _{c d^{'}}}{k _{c, 0}},

\tilde{M}_{d d^{'}} = \frac{1}{k _{d, 0}} c \sum \frac{M _{c d} M _{c d^{'}}}{k _{c, 0}},

DCI = \frac{Q - ⟨ Q ⟩}{s t d ( Q )} .

DCI = \frac{Q - ⟨ Q ⟩}{s t d ( Q )} .

H C I_{i, t} = ρ H C I_{i, t - 1} + β_{1} G D P p c_{i, t} + β_{k} co n t r o l s_{i, t} + γ_{i} + δ_{t} + u_{i, t} .

H C I_{i, t} = ρ H C I_{i, t - 1} + β_{1} G D P p c_{i, t} + β_{k} co n t r o l s_{i, t} + γ_{i} + δ_{t} + u_{i, t} .

s_{c d} = \frac{X _{c d}}{\sum _{d^{'}} X _{c d^{'}}},

s_{c d} = \frac{X _{c d}}{\sum _{d^{'}} X _{c d^{'}}},

DICI_{d} = \frac{1}{N _{d}} c \sum M_{c d} s_{c d} G D P_{c},

DICI_{d} = \frac{1}{N _{d}} c \sum M_{c d} s_{c d} G D P_{c},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\titlealternative

The development of nations conditions the disease space

\authoralternative

A. Garas, S. Guthmuller, and A. Lapatinas

\wwwhttp://www.sg.ethz.ch \makeframing

The development of nations conditions the disease space

Antonios Garas1

Sophie Guthmuller2

and Athanasios Lapatinas2***Corresponding author ([email protected]). This research was conducted while Sophie Guthmuller and Athanasios Lapatinas were in service at the European Commission’s Joint Research Centre. The scientific output expressed does not imply a European Commission policy position. Neither the European Commission nor any person acting on behalf of the Commission is responsible for any use that might be made of this publication.

1ETH Zurich, Chair of Systems Design,

Weinbergstrasse 56/58, 8092 Zurich, Switzerland

2European Commission, Joint Research Centre,

Via E. Fermi 2749, TP 361, Ispra (VA), I-21027, Italy

Abstract

Using the economic complexity methodology on data for disease prevalence in 195 countries during the period of 1990-2016, we propose two new metrics for quantifying the relatedness between diseases, or the ‘disease space’ of countries. With these metrics, we analyze the geography of diseases and empirically investigate the effect of economic development on the health complexity of countries. We show that a higher income per capita increases the complexity of countries’ diseases. Furthermore, we build a disease-level index that links a disease to the average level of GDP per capita of the countries that have prevalent cases of the disease. With this index, we highlight the link between economic development and the complexity of diseases and illustrate, at the disease-level, how increases in income per capita are associated with more complex diseases.

Keywords: health complexity, disease complexity, economic development

1 Introduction

Popular belief holds that the European conquest of America was accomplished with guns and soldiers. However, Bianchine and Russo [13] show that new illnesses brought from the Old World by European conquistadors, which resulted in devastating epidemics throughout the New World, were the major forces behind the aboriginal depopulation of the Americas. Our history, geography, culture, religion, and language have often been influenced by infections that have plagued humankind and shaped important events. Examples include the plague in fourteenth century Europe, how the yellow fever increased the importation of African slaves in the sixteenth century due to shortage of indigenous workers and the relative resistance of Africans to the disease, as well as the typhus deaths of the Napoleon’s army during the 1812 Russian campaign, and Franklin D. Roosevelt’s hypertension and heart failure, which worsened during his February 1945 dealings with Joseph Stalin in Malta [86, 95, 103].

Furthermore, there is strong historical evidence that the wealth of nations is positively linked to the health of their populations. Since the eighteenth century, economic development associated with improvements in nutrition, access to sanitation, public health interventions, and medical innovations such as vaccination, have contributed to the reduction of major infectious diseases, the decline of premature death rates, and a longer life expectancy for children and adults in both developed and developing countries [39, 40, 14].

Nevertheless, many significant health problems have emerged in concert with economic development and technological modernization. Among them, stress, anxiety, sleep deprivation, and depression are mental disorders that are more prevalent in high-income countries. While they account for only 9% of the burden in low-income countries, this figure is 18% in middle-income and 27% in high-income countries [88]. In OECD countries, a longer life expectancy is coupled with a higher rate of chronic and long-term illnesses in older populations [25]. Industrialization has expanded the reach of existing food-related diseases and created new disorders and addictions [26]. Industrialization also stimulates urbanization, the process of population migration from rural areas to cities. This makes urban areas focal points for many emerging environmental and health hazards. According to the World Health Organization (WHO), “as urban populations grow, the quality of global and local ecosystems, and the urban environment, will play an increasingly important role in public health with respect to issues ranging from solid waste disposal, provision of safe water and sanitation, and injury prevention, to the interface between urban poverty, environment and health.”.222WHO, ‘Urbanization and health’ https://www.who.int/globalchange/ecosystems/urbanization/en/ Industrialization is also linked to occupational accidents and work-related diseases (e.g., work-related cancers, musculoskeletal disorders, respiratory diseases, psycho-social problems, and circulatory diseases), which are worldwide problems resulting in important losses for individuals, organizations and societies [32, 80, 99, 85, 110, 54, 64, 89, 65].

From the above discussion, it becomes clear that economic development can affect population health in a number of ways, both positive and negative. To disentangle the net impact of economic development on countries’ health status, we develop a new metric called the Health Complexity Index (HCI), which quantifies the disease space of countries, i.e., the network representation of the relatedness and proximity between diseases with prevalent cases worldwide. To compute the HCI, we follow the economic complexity methodology, which was initially applied to trade micro-data, measuring the amount of knowledge materialized in a country’s productive structure.

More, specifically, the Economic Complexity Index (ECI) is a metric that quantifies a country’s product space, i.e., the network of products traded internationally. When a country produces a good that is located in the core of the product space, many other related goods can also be produced with the given capabilities. However, this does not hold for goods lying in the network’s periphery, because they require different capabilities. The $ECI$ methodology encapsulates this information by assigning lower values to countries that export products located at the periphery of the product space and higher values to countries that export commodities located in the center of the product space [62].

Based on the ECI methodology, a number of recent contributions explain economic development and growth as a process of information development and of learning how to produce and export more diversified products [1, 20, 21, 28, 38, 58, 57, 61, 62, 91, 102, 4, 93, 29, 56]. Furthermore, Hartmann et al. [55] have recently shown that countries exporting complex products tend to be more inclusive and have lower levels of income inequality than countries exporting simpler products. In addition, Lapatinas and Litina [71] find that countries with high intellectual quotient (IQ) populations produce and export more sophisticated/complex products, while Lapatinas [70] shows that the Internet has a positive effect on economic complexity. Adopting the economic complexity methodology, Balland and Rigby [10] compute a knowledge complexity index with more than two million patent records for US metropolitan areas between 1975-2010. They analyze the geography and evolution of knowledge complexity in US cities and show that the most complex cities in terms of patents are not always those with the highest rates of patenting. In addition, using citation data, they show that more complex patents are less likely to be cited than simpler patents when the citing and cited patents are located in different metropolitan areas.

In this paper, we build a complexity index that measures the composition of a country’s pool of prevalent cases of diseases by combining information on the diversity of diseases in the country and the ubiquity of its diseases (the number of other countries that also have prevalent cases of that disease). The intuition is that relatively high scores on the health complexity index indicate populations that are diverse and have diseases that, on average, have low ubiquity, i.e., these diseases have prevalent cases in only a few other countries.

In this view, the health complexity index does not refer to a complex treatment or to complex causes of a disease, but measures instead whether a disease is located in the densely connected core of the disease space i.e., whether many other related diseases have prevalent cases in many other countries. The country-disease network and the disease space reveal information about the health-related habits of populations, such as, lifestyle and dietary habits. There are also multiple reasons to expect the disease structures to be associated with their ‘structural transformations’ (i.e., the industrialization process by which economies diversify from agriculture to manufacturing and services [59, 81, 47, 60, 68]), with their environmental performance [67, 63, 24, 66], or with their adopted health-related policies [46, 106, 76, 33, 41], as these contribute to their health status and living standards [75, 42].

The aim of this paper is fourfold: $(i)$ to build two new metrics that quantify the disease space, following the economic complexity methodology; $(ii)$ to estimate the effect of economic development on countries’ health complexity using the new metrics and following dynamic panel data econometric techniques; $(iii)$ to develop a disease-level index that links a disease to the average level of $GDP$ $per$ $capita$ of the countries in which the disease has prevalent cases; $(iv)$ to illustrate how a country’s economic development is associated with changes in its disease composition and verify the relationship between economic development and health complexity at the disease level.

The remainder of the paper is structured as follows. Section 2 describes the data on disease prevalence and the construction of the country-disease network and the disease space which form the analytical backbone of our study. Section 3 presents the methodology for developing the Health Complexity Index (HCI) and the Disease Complexity Index (DCI). Section 4 presents the results of the structural analysis of the disease space and the country-disease network, with a particular focus on countries and regions. Section 5 empirically investigates the effect of economic development on health complexity using the HCI, data on $GDP$ $per$ $capita$ and potential covariates. Section 6 introduces an index that decomposes economic performance at the disease level. Using this index, we highlight the link between disease complexity and economic development. We demonstrate, at the disease level, that better economic performance is associated with more complex diseases. Finally, in section 7, we offer some concluding remarks.

2 The country-disease network

2.1 Data on prevalent cases of diseases

Information on diseases comes from the 2016 Global Burden of Diseases (GBD) study by the Institute for Health Metrics and Evaluation (IHME), an independent population health research center at UW Medicine (University of Washington) [45] that collects data from various sources to examine, among other things, the prevalence of diseases and injuries across the world (http://www.healthdata.org/).

Diseases and injuries are grouped by causes. The broader classification of causes (level 1) includes: (a) communicable, maternal, neonatal, and nutritional diseases such as HIV/AIDS and sexually transmitted infections, respiratory infections and tuberculosis, enteric infections (e.g., diarrheal diseases, typhoid fever), neglected tropical diseases (e.g. malaria, chagas disease) and other infectious diseases (e.g. meningitis and acute hepatitis), maternal and neonatal disorders (e.g., maternal abortion and miscarriage, ectopic pregnancy, maternal obstructed labor and uterine rupture), nutritional deficiencies (e.g., protein-energy malnutrition, vitamin A, iron, iodine deficiencies); (b) non-communicable diseases such as cancers, cardiovascular diseases, chronic respiratory diseases, digestive diseases (e.g., cirrhosis, gastritis, pancreatitis), neurological disorders (e.g., multiple sclerosis, epilepsy, Parkinson’s and Alzheimer’s diseases, migraine), mental disorders (e.g., schizophrenia, anorexia nervosa and bulimia nervosa, conduct and hyperactivity disorders), substance use disorders (e.g., alcohol and drug use disorders), diabetes, kidney diseases, skin diseases (e.g., dermatitis, bacterial skin diseases), sense organ diseases (e.g., glaucoma, cataract, vision loss), musculoskeletal disorders (e.g., osteoarthritis, rheumatoid arthritis); (c) injuries such as transport injuries (e.g., pedestrian road injuries, cyclist and motorcyclist road injuries), unintentional injuries (e.g., falls, poisonings, exposure to mechanical forces), self-harm and interpersonal violence (e.g., sexual violence, conflict and terrorism, executions).333In the remainder of the paper we use the word ‘disease’ to refer to all diseases and injuries classified in the GBD study.

We use information for the most detailed level of causes in the GBD taxonomy (i.e., level 4, and when there is no level 4 classification, we use level 3). For example, among the non-communicable diseases (level 1), neoplasms (level 2) include the following level 3 categories: lip and oral cavity cancer, nasopharynx cancer, other pharynx cancer, esophageal cancer, stomach cancer, colon and rectal cancer, liver cancer, gallbladder and biliary tract cancer, pancreatic cancer, larynx cancer, etc. Then, liver cancer includes the following level 4 subcategories: liver cancer due to hepatitis B, liver cancer due to hepatitis C, liver cancer due to alcohol use, liver cancer due to non-alcoholic steatohepatitis (NASH), liver cancer due to other causes. In this case, as level 4 categories are available, we consider the information for these categories.

Two measures of disease prevalence are exploited: the rate of prevalence (number of cases per 100,000 population) for all ages, and the age-standardized rate of prevalence to account for the differences in age structures across countries. This is useful because relative over- or under-representation of different age groups can obscure comparisons of age-dependent diseases (e.g., ischemic heart disease or malaria) across populations.

2.2 The country-disease bipartite network

Instrumental to our analysis is the bipartite network mapping of countries and diseases. Bipartite, or bi-modal networks are abundant in the scientific literature, with examples including the city-tech knowledge network [10], the city-firm network [44], firm-projects networks [9], predator-prey networks [5], plants-pollinator networks [12] etc. Here, we use data from the 2016 Global Burden of Diseases study that assessed the disease burden of countries in the period of 1990 to 2016, and we generate an $l\times k$ country-diseases matrix $\bf E$ , were the matrix element $E_{cd}$ represents the prevalent cases for disease $d$ per 100,000 population in country $c$ .

The aforementioned matrix allows for the construction of an undirected, weighted county-disease network by linking each disease to the countries that have prevalent cases. These networks are very dense, and in order to visually explore their structure, we apply the Dijkstra algorithm [30] to extract a Maximum Spanning Tree (MST) that summarizes their structures. More precisely, the MST, which is usually considered as the backbone of the network, is a connected subgraph having $l+k-1$ edges with the maximum total weight and without forming any loops.

In Figure 1 we illustrate the country-disease MST for 2016. From this MST, we can easily identify clusters of countries that are linked to specific types of diseases. The main node of the network is caries in permanent teeth (disease cause number 682). This disease is the most common disease across the world, as it is present in the majority of countries. It is also the disease with the highest number of prevalent cases worldwide (2.44 billion cases in 2016 [45])

2.3 The disease space

The clustering of countries and diseases in the MST of the country-disease network already points towards relations in the prevalence of different diseases. To explore this further, we construct the disease space, similar to the product-space introduced by Hidalgo et al. [62]. More precisely, from the country-disease matrix $\bf E$ , we calculate the ‘relative disease disadvantage’ (RDD) matrix, as described in the methods section (Section 3). In total, a country $c$ has a relative disease disadvantage in a particular disease $d$ if the proportion of prevalent cases of disease $d$ in the country’s total pool of prevalent disease cases is higher than the proportion of prevalent cases of disease $d$ in the pool of prevalent disease cases in the rest of the world. In this case, $\mathrm{RDD_{cd}}\geq 1$ .

Calculating the RDD for all country-disease pairs allows us to derive a matrix $\bf\Phi$ , whose elements $\Phi_{i,j}$ define a proximity measure between all pairs of diseases. This proximity measure reveals diseases that are prevalent in tandem, or in other words, with $\bf\Phi$ , we measure the probability that a country $c$ , which has a relative disease disadvantage in disease $i$ , also has a relative disease disadvantage in disease $j$ . The proximity measure is defined as:

[TABLE]

where $\mathrm{Pr}(\mathrm{RDD}_{i}\geq 1\ |\ \mathrm{RDD}_{j}\geq 1)$ is the conditional probability of having a relative disease disadvantage in disease $i$ if you have a relative disease disadvantage in disease $j$ . Using the minimum of both conditional probabilities, we avoid issues of a rare disease having prevalent cases in only one country. Additionally, we make the resulting matrix $\bf\Phi$ symmetric (see Figure 2). The proximity matrix is highly modular and its block structure reveals the presence of ‘communities’, i.e., groups of diseases that are expected to occur together.

Next, we map this matrix onto a network, where each disease is represented by a node and every matrix element represents a weighted and undirected link. Similar to the previous section, we start by applying Dijkstra’s algorithm on matrix $\bf\Phi$ which calculates the MST of the network. Following the rationale of Hidalgo et al. [62], we start from the strongest links that are not part of the MST and keep adding links to the network until the average degree is four. The resulting network is a visual representation of the disease space, which is shown in Figure 2.

From Figure 2, it is evident that in the disease space network, different disease categories are clustered together and, similar to the product space network of Hidalgo et al. [62], the network is heterogeneous and follows a core-periphery structure. The external part of the network (the periphery) is mostly dominated by ‘communicable, maternal, neonatal, and nutritional diseases’. In Section 4, we show that these diseases are mostly prevalent in low-income countries. On the other hand, the core of the network is dominated by ‘non-communicable diseases’, which have more prevalent cases in high-income countries (see Figure 3).

3 Methods

3.1 Health complexity index

To calculate health complexity and disease complexity, we combine information on prevalent cases of diseases and how common these diseases are across countries, following the economic complexity methodology, i.e., the formulas in the pioneering work of Hidalgo and Hausmann [61]. In short, let us assume that we have disease information for $l$ number of countries and $k$ diseases. With this information, we can fill an $l\times k$ diseases matrix E, so that matrix element $E_{cd}$ is country $c$ ’s information for disease $d$ . If there is no information for disease $d$ in country $c$ , then $E_{cd}=0$ . From this matrix, it is easy to calculate the following ratio:

[TABLE]

where $X_{cd}$ is the number of prevalent cases of disease $d$ per 100,000 population in country $c$ .

Similar to the economic complexity methodology and the discussion in [61, 21, 55], we claim that a country has a relative disease disadvantage in a disease when $\mathrm{RDD_{cd}}\geq 1$ . In other words, a country $c$ has a RDD in disease $d$ if the proportion of prevalent cases of disease $d$ in the country’s pool of all prevalent cases of disease is higher than the proportion of prevalent cases of disease $d$ in the world’s pool of all prevalent cases of disease.

Using this threshold value, we obtain the $l\times k$ matrix M, with matrix elements $M_{cd}=1$ if country $c$ has a RDD in disease $d$ , and zero otherwise. A visualization of the matrix M that is used to calculate the HCI and the DCI for this dataset is shown in Figure 8. From this matrix, similar to Hidalgo and Hausmann [61], we introduce the HCI as a measure of countries’ disease structures. To obtain the HCI, we first calculate the $l\times l$ square matrix M̃. In short, matrix M̃ provides information about links connecting two countries $c$ and $c^{\prime}$ , based on the prevalent cases of diseases in both. The matrix elements ${\tilde{M}_{cc^{\prime}}}$ are computed as

[TABLE]

where $k_{c,0}=\sum_{d}M_{cd}$ measures the diversification of country $c$ in terms of its different diseases, and $k_{d,0}=\sum_{c}M_{cd}$ measures the number of countries with information on prevalent cases of disease $d$ . If K is the eigenvector of M̃ associated with the second largest eigenvalue, then according to Hausmann et al. [57], the HCI is calculated as

[TABLE]

The HCI reflects the disease-composition of a country’s pool of diseases, taking into account the composition of the pools of all other countries. Populations with diseases that have prevalent cases of diseases that occur in many other countries have relatively low health complexity scores, while more health-complex countries have a high prevalence of non-ubiquitous diseases. In other words, a country has a complex disease composition, i.e., it is health-complex, if its diseases have high prevalence in only a few other countries. The HCI is higher for countries with diseases located at the core of the ‘disease-space’ and lower for countries with diseases located at the periphery of the ‘disease-space’.

3.2 Disease complexity index

In a similar manner, but placing the spotlight on diseases rather than countries, we can calculate the Disease Complexity Index (DCI). In this case, the $k\times k$ matrix M̃ provides information about links connecting two diseases $d$ and $d^{\prime}$ , based on the number of countries in which both diseases have prevalent cases. Therefore, the matrix elements ${\tilde{M}_{dd^{\prime}}}$ are computed as

[TABLE]

and if Q is the eigenvector of M̃ associated with the second largest eigenvalue,

[TABLE]

As discussed above, the HCI and DCI are computed using in $X_{cd}$ the number of prevalent cases of a disease (according to cause levels 3 or 4) per 100,000 population for 195 countries and for 196 diseases. The time-period covered is from 1990 to 2016. With the age-standardized data (see the discussion in Section 2.1), we also calculate the age-standardized health complexity index (AHCI) and the age-standardized disease complexity index (ADCI) by following the same formulas. We use the two indices as alternative measures when checking the robustness of our results. It should be noted here that the computation of the indices is based only on diseases for which a country has a RDD in terms of disease prevalence (the incidence matrix of the bipartite network linking countries to diseases, M, reflects whether or not a country has a RDD in a specific disease; see Figure 8). Table 1 lists the five diseases with the highest and lowest DCI scores averaged over the period of 1990-2016.

4 The geography of complex diseases

Figure 3 shows the patterns of disease specialization for the world’s economies, classified by the World Bank into four income groups - ‘high’, ‘upper-middle’, ‘lower-middle’, and ‘low’. Diseases in a region where more than half of its countries have a ${\textrm{RDD}}>1$ are shown with black nodes. It seems that high-income countries occupy the core, composed of ‘non-communicable diseases’ such as ‘pancreatic cancer’, ‘Parkinson disease’, ‘ischemic stroke’ and injuries such as ‘falls’, ‘poisonings‘ and ‘other exposure to mechanical forces’. On the other side of the spectrum, low-income countries tend to have a RDD in ‘communicable, neonatal, maternal and nutritional diseases’ that lie in the periphery of the disease space such as ‘diarrheal diseases’, ’encephalitis’ and ‘malaria’. Most of the communicable diseases for which low-income countries have a ${\textrm{RDD}}>1$ also appear in the periphery (for example, ‘Turner syndrome’, ‘neural tube defects’ and ‘pyoderma’). Examples of injuries for which low-income countries have a RDD include ‘venomous animal contact’ and ‘sexual violence’, which again appear in the periphery of the disease space.

The above descriptive findings are also observable in Figure 4, where we map the spatial variation in complex diseases. This figure shows the repartition of the HCI across countries when taking average values for the period 1990-2016. We see rather clearly that disease complexity is unevenly distributed in the world and that the most complex countries in terms of diseases seem to be located in Europe, North America, and Australia – European countries, Australia, the US, and Canada belong to the set of countries with the highest HCI (>80%). In contrast, most countries in Africa have much lower HCIs on average.

Taking a closer look at differences between particular countries, Figure 5 displays the disease maps of Portugal and the Democratic Republic of the Congo (DRC), which have the highest and lowest HCI scores (in 2016) respectively. The two countries strongly differ in the composition of their diseases. Portugal has a disease structure in which, out of all prevalent cases of disease, the proportion of non-communicable diseases (97.8%) is more than double that of communicable diseases (42.4%). The non-communicable diseases with relatively high proportions of prevalent cases include ‘tension headache’ (37.5%), ‘permanent caries’ (36.2%), ‘migraine’ (23.8%) and ‘age-related hearing loss’ (22.8%). In the communicable, neonatal, maternal and nutritional diseases category (red), ‘genital herpes’ (13.2%), ‘latent TB infection’ (10.4%) and ‘dietary iron deficiency’ (7.9%) are the diseases with the highest proportion of prevalent cases. Regarding prevalent cases of injuries (green), Portugal’s proportion of total cases in the country is 29%, while the respective rate for the DRC is only 13.6%.

The distribution of diseases in the DRC is more uniform: communicable, neonatal, maternal and nutritional diseases comprise 89.5% of total prevalent cases of diseases, and this figure is comparable to the proportion of non-communicable diseases (93.1%). For the DRC, ‘vitamin A deficiency’ (23.5%), ’malaria’ (22.9%) and ‘schistosomiasis’ (21.2%) are the diseases with the highest proportion of prevalent cases out of all cases of diseases in the country.

The above ‘structural’ differences are captured by the HCI. Communicable, neonatal, maternal and nutritional diseases are, on average, less complex than non-communicable diseases, because the former lie in the periphery of the disease space, while the latter constitute its core. Hence the DRC receives a lower HCI value compared to Portugal, because in its population, the proportion of communicable, neonatal, maternal and nutritional diseases is higher.

Regarding the evolution of HCI scores over time, figures 6 and 7 depict how health complexity in our sample of countries has changed from 1990-1996 to 2010-2016. Cambodia, Myanmar, Nepal, Vietnam, Saint Lucia and Cameroon have registered significant increases in the complexity of their diseases. On the other hand, the diseases of countries like Vanuatu, Kiribati, Palestine, Tajikistan and Gabon are now less complex than in the early 1990s. Figure 7 depicts the same information in a world map. Blue and light blue colors depict a decrease in HCI score, while orange and red colors denote countries with an increase in HCI score from 1990-1996 to 2010-2016. From these figures, it can be observed that changes over time are rather small. Hence, it seems that a country’s HCI score tends to persist through time, which is to be expected for a metric of prevalent cases of diseases aggregated at the country level. This motivates the inclusion of the lagged value of HCI in the set of explanatory variables when estimating the effect of economic development on health complexity in the next section.

5 The effect of economic development on health complexity

We study the effect of economic development on health complexity using data on GDP per capita (from the World Bank’s World Development Indicators) and the HCI (see Section 3). Given the availability of controls, the sample covers a minimum of 168 developed and developing countries over the period of 1992-2015. 444Afghanistan, Albania, Algeria, Angola, Antigua and Barbuda, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia and Herzegovina, Botswana, Brazil, Brunei, Darussalam, Bulgaria, Burkina Faso, Burundi, Cabo Verde, Cambodia, Cameroon, Canada, Central African Republic, Chad, Chile, China, Colombia, Comoros, Rep. of the Congo, Costa Rica, Cote d’Ivoire, Croatia, Cyprus, Czech Republic, Denmark, Dominican Republic, Ecuador, Arab Rep. of Egypt, El Salvador, Equatorial Guinea, Estonia, Eswatini, Ethiopia, Fiji, Finland, France, Gabon, Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Honduras, Hungary, Iceland, India, Indonesia, Islamic Rep. of Iran, Iraq, Ireland, Israel, Italy, Jamaica, Japan, Jordan, Kazakhstan, Kenya, Kiribati, Rep. of Korea, Kuwait, Kyrgyz Republic, Lao PDR, Latvia, Lebanon, Lesotho, Liberia, Libya, Lithuania, Luxembourg, Macedonia FYR, Madagascar, Malawi, Malaysia, Maldives, Malta, Mauritania, Mauritius, Mexico, Moldova, Mongolia, Montenegro, Morocco, Mozambique, Myanmar, Namibia, Nepal, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Pakistan, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Qatar, Russian Federation, Rwanda, Samoa, Sao Tome and Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovak Republic, Slovenia, South Africa, Spain, Sri Lanka, St. Lucia, St. Vincent and the Grenadines, Sudan, Suriname, Sweden, Switzerland, Tajikistan, Tanzania, Thailand, Togo, Tonga, Trinidad and Tobago, Tunisia, Turkey, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Uruguay, Vanuatu, Venezuela RB, Vietnam, Rep. of Yemen, Zambia, Zimbabwe.

5.1 Regression analysis

Previous research shows that there is a strong positive association between income and indicators of population health such as life expectancy and child mortality. There are various channels through which economic development can stimulate health improvements, for example, via its effect on nutrition (which in turn leads to better resistance to bacterial diseases and faster recovery from illnesses), as well as through greater labor market participation, worker productivity, investment in human capital, investment in public and private health services, savings, fertility, transportation infrastructure, and lifestyle habits. [87, 96, 17, 43, 83, 6, 16, 73, 35, 53].555For a review of the empirical evidence see [69]. The term ‘diseases of affluence’ refers to selected diseases and health conditions that are more prevalent in wealthy nations. Examples include mostly non-communicable diseases such as cardiovascular diseases and their nutritional risk factors (overweight and obesity, elevated blood pressure and cholesterol). It has been shown that economic development is a robust predictor of ‘diseases of affluence’ [36, 50, 79, 90, 109]. However, there is also a large and growing literature that investigates the reverse channel, i.e., that better population health leads to economic development [2, 15]. The argument is that improved health conditions increase population size, which – in the medium term – leads to more people entering the labor force, higher capital accumulation, and higher income per capita.

In order to estimate the effect of economic development on the health complexity of countries we follow a fixed-effects two-stage least squares/instrumental variables (FE 2SLS/IV) strategy, complemented with a difference Generalized Method of Moments (diff-GMM) approach. We regress the baseline specification described by the following equation:

[TABLE]

Here, the health complexity of country $i$ in period $t$ ( $HCI_{i,t}$ ) depends on the country’s level of economic development in per capita terms (in logs), $GDPpc_{i,t}$ . The lagged value of the dependent variable on the right-hand side is included to capture persistence in health complexity. The main variable of interest is $GDPpc$ . The parameter $\beta_{1}$ therefore measures the effect of income per capita on health complexity.666In order to account for possible changes in the relation between economic development and health complexity over the process of economic development, we have experimented with the inclusion of the quadratic specification of GDP per capita in the estimated equation. Our baseline results (which are available upon request) do not confirm a U-shaped relationship between economic development and health complexity. Additional potential covariates are included in the vector $controls_{i,t}$ . The $\gamma_{i}$ ’s denote a full set of country dummies and the $\delta_{t}$ ’s denote a full set of time effects that capture common shocks to the health complexity scores of all countries. The error term $u_{i,t}$ captures all other omitted factors, with $E(u_{i,t})=0$ for all $i$ and $t$ . To examine the robustness of our results and to generalize our findings, we replicate our analysis for additional/alternative control variables and substitute the HCI with the AHCI, finding qualitatively similar results (see subection 5.2).

5.1.1 Control Variables

We include in the estimated equation a number of control variables that are likely related to health complexity.

The proportion of nations’ populations over the age of 65 has been increasing in recent years and will continue to rise in future as a result of longer life expectancy. Population age-structure is a significant determinant of a nation’s health status, due to age-related diseases (i.e., illnesses and conditions that occur more frequently in people as they get older). Examples of age-related diseases include cardiovascular and cerebrovascular diseases, hypertension, cancer, Parkinson’s disease, Alzheimer’s disease, osteoarthritis and osteoporosis. Demographic factors such as age and sex are considered key covariates in the study of human health and well-being, hence the percentage of $old$ population (aged 65 and above, in logs) and the percentage of $female$ population (in logs) are included in the set of control variables.

It has previously been shown that sex interacts with social, economic and biological determinants to create different health outcomes for males and females. For example, Vlassoff [104] reviews a large number of studies on the interaction between sex and the determinants and consequences of chronic diseases, showing how these interactions result in different approaches to prevention, treatment, and coping with illness.

In our analysis, we also control for the (log) percentage of urban population, $urban$ . According to the World Health Organization (WHO), a large proportion of non-communicable diseases is linked to risks related to the urban environment, such as physical inactivity and obesity, cardiovascular and pulmonary diseases from transport-generated urban air pollution, ischemic heart disease and cancers from household biomass energy use, asthma from indoor air pollution, and heat-related strokes and illnesses. In addition, communicable diseases such as tuberculosis, dengue fever, and many respiratory and diarrheal diseases result from unhealthy urban environments (e.g. lack of adequate ventilation, unsafe water storage and poor waste management, indoor air pollution, moldy housing interiors, poor sanitation).777See WHO, Health and sustainable development: About health risks in cities, https://www.who.int/sustainable-development/cities/health-risks/about/en/

Industrialization, i.e., the structural transformation from agricultural to industrial production also has a range of significant health implications [77, 23, 98, 101, 97]. We capture these implications in our analysis by including the (log) value added of $agriculture$ (% of GDP) and the (log) value added of $manufacturing$ (% of GDP) in the estimated equation.

To check the robustness of our baseline results, we replicate our analysis controlling also for the human capital of the population by utilizing total enrolment in secondary $education$ (in logs). It is well established in the relevant literature that through education, people gain the ability to be effective in their lives. They adopt healthier lifestyles and inspire their offspring to do as well [78]. Individuals with higher levels of education also tend to have better socioeconomic resources for a healthy lifestyle and a higher probability of living and working in healthy environments [3, 18]. In addition, educated individuals tend to have lower exposure to chronic stress [84]. Low educational attainment, on the other hand, is associated with a shorter life expectancy, poor self-reported health, and a high prevalence of infectious and chronic non-infectious diseases [37, 51, 72, 108].888Grossman [49] and Ross and Wu [92] review the relationships between education and a wide variety of health measures. Furthermore, we re-estimate the baseline model by substituting $urban$ with $population$ $density$ (people per square km of land, in logs).

The variable $CO_{2}$ (log of $CO_{2}$ emissions in kg per 2010 $US of GDP) captures the effect of air pollution on health, which has been the subject of numerous studies in recent years (for an extensive review, see [19]).

Finally, $health$ $expenditure$ (log of total health spending in thousands of purchasing power parity (PPP)-adjusted 2017 $US) is also included in the set of explanatory variables controlling for the association between healthcare spending and health outcomes [105, 11, 27, 107, 74, 94, 111, 82]

Data definitions, sources and summary statistics for the variables included in the analysis are given in Table 2.

5.1.2 Instrumental variables

We estimate equation (7) using different econometric methods. First, we use fixed-effects OLS. However, fixed effects estimators do not necessarily identify the effect of economic development on health complexity. The estimation of causal effects requires exogenous sources of variation. While we do not have an ideal source of exogenous variation recognized by previous studies, there are two promising potential instruments of economic development that we adopt in our fixed-effects 2SLS/IV and diff-GMM analyses.

First, we use the KOF Swiss Economic Institute’s $economic$ $globalization$ index, characterized as the flows of goods, capital, and services, as well as information and perceptions that accompany market exchanges [31]. Higher values reflect greater economic globalization.

The second instrument considered is the KOF Swiss Economic Institute’s $political$ $globalization$ index, characterized by the number of embassies in a country, its membership in international organizations, its participation in UN security council missions and international treaties. Higher values reflect greater political globalization.

There is extensive research documenting the positive relationship between globalization and economic development and growth [31, 48, 34, 52, 22]. While we do not have a precise theory to support the prediction, it is expected that changes in the $economic$ $globalization$ and $political$ $globalization$ indices have no direct effect on a country’s disease structure and impact health complexity only indirectly, through the channel of economic development. This point is also verified in our dataset, as we find no correlation between HCI scores and these two variables (the results are available upon request).

5.2 Regression results

In this section, we discuss the results of estimating equation (7) with different econometric techniques. Table 3 reports the results of fixed-effects ordinary least squares (FE-OLS) with time dummies, adding an additional variable from the set of controls in each step (column). In all specifications, economic development has a positive relationship with health complexity, and the control variables enter with the expected sign. The $agriculture$ coefficient is negative, and countries with a higher proportion of $urban$ population exhibit greater health complexity.

The results depicted in Table 3 could only be interpreted as correlations. For a robust analysis accounting for the potential endogeneity problem in the relationship under consideration (discussed above), we use fixed-effects 2SLS/IV estimation techniques complemented with diff-GMM estimations à la Arellano-Bond [8]. Table 4 presents our baseline results.

In columns (1)-(8), we estimate equation (7) with FE 2SLS/IV regressions. We use time dummies and robust standard errors (in parentheses). In all cases, $GDPpc$ is a positive and statistically significant predictor of health complexity. In fact, we find that an increase in $GDP$ $per$ $capita$ of 10% is associated with an improvement of about 0.003 in the HCI (standard deviation: 1.004). This positive impact of economic development on health complexity is robust to the inclusion of control measures discussed in subsection 5.1.1. The statistically significant $urban$ coefficient implies that a higher proportion of urban population is associated with more complex diseases.

In the fixed effects 2SLS/IV estimations we report: (a) the $F$ - $test$ for the joint significance of the instruments in the first stage: the rule of thumb is to exceed 10, hence the test implies weak significance [100]; (b) the Durbin-Wu-Hausmann ( $DWH$ ) test for the endogeneity of regressors: the null hypothesis that the IV regression is not required is rejected; (c) the Cragg-Donald F-statistic ( $Weak$ - $id$ ), testing the relevance of the instruments in the first-stage regression: no evidence of a low correlation between instruments and the endogenous regressor is found after controlling for the exogenous regressors; (d) the Kleibergen-Paap Wald test ( $LM$ - $weakid$ ) of weak identification: the null hypothesis that the model is weakly identified is rejected; (e) the p-value for Hansen’s test of overidentification: the acceptance of the null indicates that the overidentifying restrictions cannot be rejected.

In column (9) of Table 4 we report the diff-GMM estimations including year fixed effects and robust standard errors. The results verify the previous findings both qualitatively and quantitatively, i.e., the estimated coefficient of $GDPpc$ implies an improvement of 0.003 in the HCI with a $GDP$ $per$ $capita$ increase of 10%. Among the control variables, only the $urban$ variable has a statistically significant and positive sign. The values reported for AR(1) and AR(2) are the p-values for first- and second-order autocorrelated disturbances. As expected, there is high first-order autocorrelation and no evidence for significant second-order autocorrelation. Hence, our test statistics hint at a proper specification.

In Tables 5 and 6, we investigate the robustness of our baseline findings. First, we substitute the HCI with the age-standardized health complexity index (AHCI) maintaining the same set of controls (and time dummies) as in the baseline specification. Second, we investigate whether the positive impact of economic development on health complexity persists under additional and/or alternative control measures (including time dummies). In all cases, the baseline results remain qualitatively intact. In particular, the coefficient of $GDPpc$ is positive and statistically significant in the instrumented regressions (see Table 5; to save space, we only include the first-stage estimated coefficients of the instruments – the results for the rest of the variables are available upon request).

Table 6 starts from the baseline specification with the full set of controls [column (9) in Table 4] and introduces additional variables or alternative measures for some of the previous controls. Specifically, in column (1), we add $education$ (enrolment in secondary education in logs). In column (2), we substitute the $urban$ population variable with $population$ $density$ (people per sq. km of land in logs). In columns (3) and (4), we employ (log) $CO_{2}$ emissions ( $CO_{2}$ emissions, kg per 2010 $US of GDP) and (log)$ healthexpenditure $(total health spending, thousands of 2017 PPP adjusted$ US), respectively. Finally, in column (5), we consider all of the above variables together. Adding these controls in our estimations leaves the findings qualitatively and quantitatively intact.

The above analysis suggests that economically developed countries tend to exhibit more complex disease structures. Furthermore, exploiting the temporal variation in the data, the fixed-effects 2SLS/IV analysis and the difference GMM estimators reveal a positive, statistically significant, and robust impact of economic development on health complexity.

6 Economic development and disease complexity

The economic complexity methodology provides a useful toolbox that allows us to compute indices that quantify the complexity of both countries and diseases. For example, using the same methodology that computes the HCI, but placing the spotlight on diseases rather than countries, we calculate the DCI (see Section 3). This index quantifies the complexity of countries’ diseases according to their prevalent cases worldwide. Using the economic complexity methodology, Hartmann et al. [55] recently introduced a measure that associates products with income inequality and showed how the development of new products is associated with changes in income inequality. Here, to decompose economic development at the disease level, we introduce a measure that links a disease to the average income per capita of the countries in which the disease has prevalent cases i.e., an estimate of the expected income per capita related to different diseases. In this way, we illustrate how disease complexity is being affected by the level of economic development and quantify the relationship between countries’ income per capita and the complexity of their diseases.

Following the methodology in Hartmann et al. [55], we define the Disease-Income Complexity Index (DICI), and decompose the relationship between the DCI and the DICI for the prevalent cases of diseases in our sample of countries.999We also computed the ADICI and investigated its relationship with the ADCI, finding similar results.

6.1 Disease-income complexity index

Assuming that we have information for $l$ countries and $k$ diseases, we can fill the $(l\times k)$ matrix M so that its matrix element $M_{cd}=1$ if country $c$ has a RDD in disease $d$ , and zero otherwise (see Section 3). Our dataset contains information for 195 developed and developing countries and for 196 diseases from 1990 to 2016. A visualization of the matrix M that is used to calculate the HCI and the DCI for this dataset is shown in Figure 8.

Every disease $d$ can have prevalent cases in a country $c$ . For every disease $d$ , we can calculate the fraction $s_{cd}$ :

[TABLE]

where $X_{cd}$ is the number of prevalent cases per 100,000 population for disease $d$ in country $c$ , while $\sum_{d^{\prime}}X_{cd^{\prime}}$ is the number of prevalent cases of all diseases in country $c$ . If $GDP_{c}$ is the (log) $GDP$ $per$ $capita$ of country $c$ , we can calculate the ${\textrm{DICI}}_{d}$ for every disease $d$ as:

[TABLE]

where $N_{d}=\sum_{c}M_{cd}s_{cd}$ is a normalization factor.

The DICI is defined at the disease level as the average level of (log) $GDP$ $per$ $capita$ of the countries that have a RDD in disease $d$ , weighted by the disease’s importance in each country’s pool of diseases. Utilizing the (log) PPP $GDP$ $per$ $capita$ (constant 2011 international $) from the World Bank’s World Development Indicators for the countries in our sample, we calculate the above index for every year in the period of 1990-2016.

Table 7 lists the five diseases with the highest and lowest average DICI values during the period of 1990-2016. It is evident that higher economic development is associated with more complex diseases such as motor neuron disease and malignant skin melanoma. At the other end of the spectrum, less complex diseases such as acute hepatitis E and malaria are associated with low levels of income per capita.

6.2 Linking disease complexity and economic development

In this subsection, we test the existence of a bivariate relationship between the DCI and the DICI. Thus, we calculate Pearson’s correlation coefficient for DICI against DCI. If such an association exists, it should allow us to derive expectations about whether disease complexity can be associated with economic development and verify, with disease-level data, the statistically significant and positive relationship between health complexity and economic development that we found above (Section 5.2). The correlation coefficient for the relationship between the average values of the DICI and the DCI for the period of 1990-2016 is $\rho=0.96\pm{0.01}$ with a p-value $<2.2\times 10^{-16}$ . In Figure 9, we present the scatter-plot of the relationship between the DICI and the DCI for the 196 diseases in our dataset (average values for 1990-2016), together with the fitted linear model. The slope of the linear fit is the corresponding correlation coefficient.

The statistically significant positive correlation between the DICI and the DCI indicates that more complex diseases are associated with more developed countries, as measured by the (log) $GDP$ $per$ $capita$ . This allows us to understand which sets of diseases are linked to better overall economic performance, based on their complexity.

In Table 8, we run panel regressions between the DCI and DICI. The results show that the relationship between the DCI and the DICI is the outcome of the correlations both between diseases (regression on group means) and within diseases (fixed-effects regression with time dummies and standard errors adjusted for disease clusters). This suggests that the positive effect of economic development on the complexity of diseases is due to both changes in the structure of the disease space towards more complex diseases and increases in the complexity of existing diseases.

7 Conclusions

Our analysis illustrates that a country’s level of development determines the structure of its disease space. Following the economic complexity methodology, we developed the HCI, which quantifies the network representation of the relatedness and proximity of diseases. In a dynamic panel data setting, we showed that there is a robust positive effect of a country’s economic development, measured by GDP per capita, on its level of health complexity, i.e., on the ‘structural’ composition of its pool of diseases. The evidence presented here suggests that the economic development of nations conditions the disease space. Specifically, more complex diseases tend to have relatively more prevalent cases in populations with a higher income per capita. Explicitly, it seems that when an economy accelerates, the impact on health complexity is positive.

In addition, we build the DICI, which links a disease to the average level of income per capita of the countries in which the disease has prevalent cases and illustrate how disease complexity is related to economic development. Specifically, we show how changes in GDP per capita are associated with more complex diseases. The temporal variation of the above indices is important from a policy perspective. Using the HCI and DCI, it is possible to design policies aimed at improving the recognition, visibility, and traceability of complex diseases across the globe and through time (e.g., by developing a classification system for all health information systems). These indices can also be used as tools for the development of national plans for complex diseases and the establishment of knowledge networks on complex diseases, so as to improve their diagnosis, treatment, and cure. Furthermore, the DICI could be used to design a health expenditure reallocation policy promoting health activities and services associated with the prevention of complex diseases.

This study employs the economic complexity methodology to compute two new metrics that quantify the disease space of countries. These can be valuable tools for estimating the effect of economic development on the health status of nations. The topic of economic complexity is a rather new one, and its use in economics is rather limited so far. By focusing on the topic of disease complexity, our contribution lies in bridging the health economics literature with the literature that highlights economic complexity as a powerful paradigm in understanding key issues in economics, geography, innovation studies, and other social sciences.

Bibliography111

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abdon and Felipe [2011] Abdon, A.; Felipe, J. (2011). The product space: What does it say about the opportunities for growth and structural transformation of Sub-Saharan Africa? Levy Economics Institte Working Paper (670) .
2Acemoglu and Johnson [2007] Acemoglu, D.; Johnson, S. (2007). Disease and development: the effect of life expectancy on economic growth. Journal of political Economy 115(6) , 925–985.
3Adler and Newman [2002] Adler, N. E.; Newman, K. (2002). Socioeconomic disparities in health: pathways and policies. Health affairs 21(2) , 60–76.
4Albeaik et al. [2017] Albeaik, S.; Kaltenberg, M.; Alsaleh, M.; Hidalgo, C. A. (2017). Measuring the Knowledge Intensity of Economies with an Improved Measure of Economic Complexity. ar Xiv preprint ar Xiv:1707.05826 .
5Allesina and Tang [2012] Allesina, S.; Tang, S. (2012). Stability criteria for complex ecosystems. Nature 483(7388) , 205.
6Alleyne and Cohen [2002] Alleyne, G. A.; Cohen, D. (2002). The Report of Working Group I of the Commission on Macroeconomics and Health. WHO Commission on Macroeconomics and Health, April .
7Almeida-Neto et al. [2008] Almeida-Neto, M.; Guimaraes, P.; Guimarães, P. R.; Loyola, R. D.; Ulrich, W. (2008). A consistent metric for nestedness analysis in ecological systems: reconciling concept and measurement. Oikos 117(8) , 1227–1239.
8Arellano and Bond [1991] Arellano, M.; Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The review of economic studies 58(2) , 277–297.