The emerging sectoral diversity of startup ecosystems
Clement Gastaud, Theophile Carniel, Jean-Michel Dalle

TL;DR
This paper investigates the sectoral diversity of startup ecosystems in Europe and the USA, using new visualization tools and ecological metrics, and models the diversity emergence with a funding-based preferential attachment model.
Contribution
It introduces a novel visualization approach and ecological metrics for analyzing startup ecosystem diversity, and proposes a simple model explaining diversity growth based on funding patterns.
Findings
Marked differences in sectoral diversity across ecosystems
Diversity characterized using ecological metrics
A funding-based preferential attachment model explains diversity emergence
Abstract
Thanks to the recent availability of comprehensive and detailed online databases of startup companies, it has become possible to more directly investigate startup ecosystems i.e. startup populations in specific regions. In this paper, we analyze the emergence of 20+ such ecosystems in Europe and the USA, with a specific focus on their sectoral diversity. Analyzing the sectoral landscapes of these ecosystems using a new visualization tool indeed highlights marked differences in terms of diversity, which we characterize using metrics derived from ecological sciences. Numerical simulations suggest that the emerging diversity of startup ecosystems can be explained using a simple preferential attachment model based on sectoral funding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivate Equity and Venture Capital · Entrepreneurship Studies and Influences · Firm Innovation and Growth
The emerging sectoral diversity of startup ecosystems
Clément Gastaud1,2, Théophile Carniel1,2, Jean-Michel Dalle*,1,2,3
1 Agoranov, Paris, France
2 Sorbonne Université, Paris, France
3 i3-CNRS, École Polytechnique, France
Abstract
Thanks to the recent availability of comprehensive and detailed online databases of startup companies, it has become possible to more directly investigate startup ecosystems i.e. startup populations in specific regions. In this paper, we analyze the emergence of 20+ such ecosystems in Europe and the USA, with a specific focus on their sectoral diversity. Analyzing the sectoral landscapes of these ecosystems using a new visualization tool indeed highlights marked differences in terms of diversity, which we characterize using metrics derived from ecological sciences. Numerical simulations suggest that the emerging diversity of startup ecosystems can be explained using a simple preferential attachment model based on sectoral funding.
1 Introduction
Startup populations have recently come to be commonly referred to as ”startup ecosystems”, by analogy with ecological systems. This metaphor has emerged in economics and management sciences in the early 90’s from different sources in order to study the creation, growth and death of organizations [1], the competition between industrial actors [2] or else the emergence of new technology niches [3]. More recently, startup ecosystems have become central in local, national and international innovation policies [4, 5] as innovative startups were drawing increasing investments [6] from venture capitalists and an increased attention from stakeholders notably because of their potential to create jobs [7, 8, 9]. Indeed, following the leading example of San Francisco and the Silicon Valley, Austin, Boston, Los Angeles or New York in the US and Berlin, London or Paris in Europe [12] have thrived to become active entrepreneurial ecosystems and are competing against one another in order to attract startups.
In this context, data and models that would allow entrepreneurs, investors and policy makers to analyze, characterize and compare the emergence and dynamics of different startup ecosystems are however still mostly missing. Even if professional websites such as [13, 14, 15] have started to gather relevant information, there is a global lack of understanding concerning the fundamental mechanisms driving the development of entrepreneurial ecosystems. Startup landscapes only provide a representation of the startups in a specific sector, split in sub-sectors, such as the global fintech landscape edited in 2016 by Atherton Research [16] or startups associated with a specific technology and geographical zone, such as the 2017 France Is AI’s landscape [17], all the more so as existing landscapes are mostly instantaneous snapshots and lack completeness. Put differently, there does not exist, as far as we know, any quantitative model or tool that would significantly help actors make a more appropriate sense of the dynamics of startup ecosystems, a fact that is somewhat surprising since entrepreneurship has become a major topic in public policy decision making [19, 20].
In this article, we argue that the recent availability of comprehensive public startup databases represents an opportunity for the formulation and validation of such theoretical models, related to the dynamics of startup ecosystems. Building upon an automated startup landscape generator that allows for the visualization of entrepreneurial ecosystems while incorporating relevant metadata (e.g. textual descriptions, sector of activity and funds raised) [21], we first suggest to extend the ecological analogy and characterize ecosystems using diversity metrics. We then try to relate the observed differences among ecosystems to macro-economic indicators before presenting and calibrating a numerical simulation that explains the diversification of startup ecosystems with a preferential attachment model based on the funding received within each sector.
2 Materials and methods
2.1 Dataset
We exploit a dataset of startups from Crunchbase [22], a mainstream source of data for academic research with respect notably to US startups [18]. For European ecosystems, this dataset was supplemented by Dealroom [23], which increased the number of considered companies by 9.4%. For each startup, we retrieved its date of creation, location, sectoral tags (describing its economic sector, technology and/or market), textual description and, most notably, all the information with respect to the funds that the startup has raised, including the date at which they were raised, the amount of funding, the nature of the funding round and the identity of the investors as well as all the articles mentioning this company available on Crunchbase (Figure 1). In addition, we retrieved all information available about people on Crunchbase, giving us in particular proxies with regard to the experience of startup founders. By nature of the funding round, we mean the different stages of Venture Capital funding that startup companies go through. In this respect, the first round of funding is generally called Seed and corresponds to money used to validate that the product of the startup and the market are in phase. Other rounds are labeled by letters: A, B, C, etc. A rounds are designed to ensure the scalability of the company while later rounds (B and latter ones) tend to accompany the growth of the company in national and international markets [24]. We limited our sample to companies created after January 1st, 1998 and to companies that mentioned at least one round of funding. Overall, our dataset consists in companies, investment rounds, people and news articles.
For each startup, we further computed two additional metrics: first, the total amount of funds it has raised and second, the speed at which these funds were raised (as a proxy for its pace of growth) which we denote as its momentum and define at time , in dollars per month, as:
[TABLE]
2.1.1 Construction of the sectoral tree
In order to visualize ecosystems, we organized startups according to their main economic sectors. We used Crunchbase’s basic tag structure as a starting point to create a startup sectoral tree. This tag structure is organized in two levels: first, the industry (Health Care, Software…) and then a more specific level (Health Insurance, Image Recognition, Construction…). The resulting ontology was cleaned up by removing tags that were specially broad and not distinctive (Software, Infrastructure, etc.), unrelated to economic sectors (B2B, Freemium, etc.) or very rare (e.g. Ports and Harbors that was only associated with 10 startups worldwide in our dataset). Furthermore, tags that were semantically very close (e.g. Shipping and Delivery, Video Games and Gaming…) were merged. Then, whenever two tags had an inclusion relation not taken into account in the initial ontology (e.g. Insurance and Health Insurance), this relation was used to create a new sub-level in the tree, as in:
Financial Services → Insurance → Health Insurance
It should be noted that in a few cases, visualization and classification prompted a manual edit of sectoral tags that were found to be either imprecise (e.g. startups with only a very general tag such as Software) or too numerous (e.g. a startup tagged with all the industries that could possibly make use of its technology) or simply factually erroneous. Following this procedure, the final sectoral tree was composed of 478 sectoral tags, down to 4 levels and composed of 28 industries i.e. independent sectors directly connected to the root of the tree111A full description of the sectoral tree is available upon request to the authors.. Part of the Data and Analytics branch is shown as an example in fig. 2.
2.1.2 Populating the sectoral tree with startups
This sectoral tree is used to populate a startup tree, considering startups as end leaves. Since most startups have several sectoral tags, we implemented a heuristic procedure to prune the tree i.e. to keep the most relevant tag for each startup. Following a strategy similar to [25], we can determine a startup’s main industry (or tag) by classifying its description. For each tag, we compute the probability that a startup is best described by it. If this startup has no tag, we choose the most probable tag overall. If it has several tags and some are included in others, we first remove the shallowest ones in the sectoral tree as it corresponds, by definition, to the least precise sectoral assignment. Then, we choose the most probable tag from the remaining ones.
In the end, the simplest (least ambiguous) possible sectoral tree is obtained with startups associated as end leaves.
2.1.3 An interactive visualization tool for startups ecosystems
In order to visualize the ecosystems, we made use of the TreeMaps FoamTree package [26], which allows to display hierarchical data as nested polygons tiling the plane, each cell having a surface proportional to a specific dimension of the data, as is general in tessellations and treemap representations [27].
Examples of such visualizations are presented in appendix. Each cell of the map corresponds to a startup, its surface representing the amount of funding received by the startup and its color the momentum as defined in eq. 1. The visualization typically confirms widely acknowledged characteristics of these ecosystems: for instance, London appears specialized in FinTech (22.6% of the investments) while Paris appears particularly strong with respect to Health Care. Furthermore, each ecosystem can be easily visualized through an interactive interface 222Available at: http://atlas.agoranov.com, while several filters can be applied to the map using all the data available on startups: tags, location, investors, etc. Thanks to the timestamps on each event, an ecosystem can also be visualized at any given date in order to study ecosystem and investment dynamics.
2.2 Introducing diversity metrics for startups ecosystems
To better characterize startup ecosystems, we introduce diversity metrics similar to what is traditional for ecological ecosystems. In ecological ecosystems, diversity is on average positively correlated with stability [28]: if a change in a diverse environment (for example a disease, or the arrival of a predator) targets some species, the impact on the whole ecosystem will be reduced because of functional redundancy. In economics, the relationship between diversity and unemployment stability has been widely studied [29, 30] and it has notably been proposed that, as for ecology, diversity was positively correlated with stability through resilience of the economy to rapid changes [31, 32, 30], although empirical analysis using regional data does not always confirm this hypothesis [30]. Similarly, a disruption in some sectors might more or less affect an entire startup ecosystem depending on its diversity across industries and sectors.
At least three major diversity indices have been defined and used in ecology: the Simpson index, the Shannon-Weiner index and the Hill index [33]. Both the Shannon-Weiner and Simpson indices and their corresponding diversity can be derived from the Hill numbers of order and respectively [34]. In the context of this study, we implemented the Shannon-Weiner index (measures the diversity of the ecosystems within the previously defined sectoral ontology) and the Herfindahl-Simpson index (measures the concentration of investments between startups regardless of their sectors and industries).
2.2.1 The Simpson & Herfindahl indices
In ecology, studies usually present the Simpson index [35] defined as:
[TABLE]
where traditionally in ecology, is the total number of species and the relative abundance of the species . In the present case would be the total number of tags and the ratio between funding invested in sector and the total funding of the ecosystem. The Simpson index measures the probability that two individuals randomly chosen from a population belong to the same species. The extreme cases and correspond respectively to a maximal and a minimal diversity.
In order to have a more straight forward interpretation, the inverse Simpson index is often used. This corresponds to the effective number of species or true diversity as defined in [36, 37]. To give a quick intuition of the concept, this number converts the computed diversity index of the studied ecosystem into a corresponding ecosystem where all species are equally abundant; the resulting number of different species corresponds to the effective number of species (an unbalanced ecosystem with species each with different values of would be converted into an ecosystem with species each with for ).
This index is also used in economics where it is called the Herfindahl index [38] and is usually used to study the importance of a company on a given market. It is defined as follows :
[TABLE]
where is the market share of a company .
However, this index does not take the sectoral tree structure into account and focuses solely on the repartition of funds between actors.
2.2.2 The Shannon-Weiner index
The Shannon-Weiner index, or Shannon entropy, originating from information theory [39] and statistical physics [40], is defined as follows:
[TABLE]
In the ecology literature, base for the logarithm is usually used [36]; this convention will be used in the following. Shannon entropy quantifies the uncertainty associated with the prediction of an element of the considered dataset. In the context of ecology, it quantifies the uncertainty in predicting to which species an individual taken at random from the dataset belongs. However, in this form, the additional information given by the tree structure of the data is still not taken into account. Its hierarchical structure needs to be taken into consideration in the analysis. A more apt measure of the entropy of a tree is thus :
[TABLE]
with the number of branches originating from , the entropy of the subtree and being either the ratio of funding invested in compared to total funding invested in or the ratio of the number of startups in compared to the total number of startups in (i.e. the probability of knowing ). We refer to the entropy computed using the ratios of funding as Shannon funding and the entropy computed using the ratios of number of startups as Shannon startups.
Naturally, this measure is dependent on the structure of the ontology defined previously. This issue is well-known in ecology and emerges from the definition of a species that one chooses to use [41].
Following [36, 37], the effective number of species can be derived from the Shannon-Weiner entropy index :
[TABLE]
The Shannon-Weiner index value for a tree with all categories having equal population (478 categories in total) is about . The corresponding effective number of species is then which is coherent with our definition and understanding of this metric.
Hill numbers of order (Shannon-Weiner diversity) are to be favored when calculating diversities without any prior information about the ecosystem (from [37], ”orders higher than 1 are disproportionately sensitive to the most common species, while orders lower than 1 are disproportionately sensitive to the rare species.”). Upon applying Hill diversity indices of order- and - to our dataset, we indeed find that the order- index allows us to gain insight into ecosystem dynamics whereas the order- index does not discriminate well between ecosystems. We will use the order- index as our diversity measure in the following.
2.3 Simulating ecosystem growth
To try and understand some of the mechanisms behind the growth and diversification of an entrepreneurial ecosystem, we simulated the development of a startup ecosystem as described by our ontology (number of startups and amount of funding in structured categories). The incremental populating of the ecosystem was done following a simple preferential attachment model on the current state of the tree. Two main variants of the model were used :
- •
In the first one, the new startup is placed in category with with probability
[TABLE]
where is the number of startups in each category and a free parameter of the model.
- •
In the second one, the new startup is created with a funding amount drawn from a powerlaw distribution with exponent and support . This new startup is then placed in category with with probability
[TABLE]
where is the total funding of the startups in category and a free parameter of the model.
3 Results and discussion
We applied our methodology to compare thirty-four ecosystems in Europe, North-America, Asia and Australia, chosen based on their prominence [12]. Figure 3 sums up the sizes in terms of number of startup companies and total funding of these ecosystems as of January the 1st, 2018 and figs. 9, 10, 11 and 12 shows the mapping of some ecosystems using our visualization method. The size of the startup cells is proportional to the amount of funds they raised and the color encodes its momentum, as defined in Eq. 1. Purple means that the company went public, and the shades from red to beige represent in sequence the top 1%, 5% and 15% startups with the highest momentum. The industries are ordered from top left to bottom right following their total funding.
As expected, the funding increases with the number of startups. However, the ecosystems visually exhibit significant disparities in terms of funding allocation. For instance, while Paris hosts twice as many startups as Atlanta, the total cumulative investments in both cities are comparable (since January 1st, 2000 9.9B in Paris). Mapping the ecosystems might shed some light on this observation. Some, like Atlanta, Berlin or Stockholm (fig .11) appear characterized by a relatively weak diversity, related to the presence of a few champions – unicorns – such as Kabbage in Atlanta, Delivery Hero in Berlin or Spotify, Klarna and iZettle in Stockholm that have raised billions of dollars. On the other hand, Paris, New York or the Silicon Valley (figs. 9 and 12) appear much more diverse, in terms of funding as well as industry.
Being able to visualize the evolution of ecosystems also captures dynamic trends. For instance, the slow fall of Manufacturing in Ile-de-France is explicit in these representations, falling from 12.7% of the total investments in 2010 to 7.0% in 2018 (fig. 9). In London on the other hand, Financial Services investments have skyrocketed over the same period from 10.3% of investments to 22.6% (fig. 10). It is now the biggest funding recipient in the British metropolis.
3.1 Measured diversity
Fig. 4 shows the evolution of selected ecosystems in terms of number of startups and effective number of species between January the 1st, 2010 and January the 1st, 2018. Each of them is represented by a pair of values per year with and respectively the number of startups in the ecosystem and the effective number of species in the ecosystem at time . Diversity is computed using the funding per category (Shannon funding) for the left plot and the number of startups per category (Shannon startups) for the right plot. Diversity is higher for Shannon startups compared to Shannon funding and individual trajectories tend to be more distinct for high numbers of startups (see for instance Silicon Valley or New York), suggesting that ecosystem-specific dynamics could be at play when the ecosystem becomes sufficiently large.
Since ecosystems differ widely in terms of number of startups, it is useful to scale diversity trajectories so that the number of startups at the start and end of the measuring period are comparable. Fig. 5 presents the standardized ecosystem dynamics i.e. ecosystem are characterized by value pairs :
[TABLE]
with the index of the first data point (year 2010) and the index of the last data point (year 2018).
Using these standardized metrics, all ecosystems have similarly-shaped trajectories using Shannon startups (right plot) whereas trajectories computed using Shannon funding (left plot) seem to be more variable, probably due to the large discrepancies in individual funding amounts which can easily unbalance an ecosystem especially in early stages of development. The diversification in terms of number of startups per sector (Shannon startups) thus seems to be a more fundamental characteristic shared between all our studied ecosystems when compared to the diversification in terms of funding per sector. We will therefore use Shannon startups to compute entropy and diversity during the numerical simulations.
3.2 Correlations to macro-economic indicators
In order to move beyond visual intuitions from the landscapes, we made use of the diversity metrics defined in the previous section. We fitted an OLS model to find correlations between the effective number of species and macro-economic indicators retrieved from the OECD Regional Statistics [42] including :
- •
Wealth (GDP and GDP per capita),
- •
Economic vitality (Employment, GDP growth (base 2007)),
- •
Research intensity (% labor force with tertiary education, number of researchers per 1000p, number of patents, R&D expenses in M$ and % of GDP).
All the values are standardized by removing the mean and scaled to unit variance. As the logarithm of the number of startups explain 80% of the variance of (), we fit the indicators against . By fitting this value against the previously defined indicators while still controlling for the logarithm of the number of startups, we find a correlation with the GDP per capita (p-value of ). Diversity thus only seems related to the economic development and, surprisingly, not at all to the research intensity of the metropolis. However, this observation can simply be a consequence of a higher maturity of startup ecosystems in developed countries, since they have existed for a longer time.
3.3 Simulation results
Simulation results of the two variants of the model can be found in figs. 6 and 7, with the diversity values from the simulation results (color lines) computed based on the entropy calculated using eq. 5 with the number of startups in each category (Shannon startups). We compared these values to diversity results from our dataset computed from the funding amounts (fig. 6) and the number of startups (fig. 7). These figures show that preferential attachment on the number of startups (left plots and eq. 7) seems to explain the diversification of the ecosystem up to a certain point, but that diversity is not stable as the ecosystem continues to grow i.e. all new startups end up concentrating in a small number of categories and the effective number of species collapses.
Preferential attachment on the category funding (right plots and eq. 8) on the other hand, seems to better match the data computed from Shannon startups and Shannon funding, as the ecosystem diversity steadily increases over time using this model. Preferential attachment on the funding amounts is thus a better mechanism than preferential attachment on the number of startups in order to explain the diversification of an ecosystem throughout its growth when comparing our results to the data. In the case of Shannon funding, the data and simulation results seem to match particularly well for free parameter values around (see fig. 8).
To check the robustness of these results, a numerical simulation variant of these models was tested where a new startup was placed in a random category with probability and with probability was placed in a category following the preferential attachment law described in sec. 2.3. No qualitative differences were found between simulations with and (the case corresponds to a standard preferential attachment model as shown in figs. 6 and 7).
Fig. 8 shows that good concordance between the data points with Shannon funding (red dots) and simulation results (black line) is obtained for preferential attachment on funding with .
Models of mixed preferential attachment taking into account both number of startups and total funding at the same time were tested following eq. 11 :
[TABLE]
with controlling the importance of funding amounts vs. number of startups and and free parameters of the model. Simulations for a range of values of , and of did not provide a better match to the data than preferential attachment simply on the total category funding (fig. 6).
Finally, a simple mixed model of firm creation and growth was also confronted to the data. At each iteration of the simulation, a new startup is created with probability in category following eq. 8 and is allocated seed funding. With probability , a random existing startup was funded with an amount depending on its last simulated funding round and moved on to the next stage of the ”alphabet round” system (i.e. a company that last received seed funding received series A funding, a company that last received series A funding received series B and so on). We set based on our data on the Silicon Valley ecosystem (seed rounds or ”new entrants” represent approximately half of all venture funding rounds). The distribution of types of funding rounds at each stage with these parameters was found to be similar to that of our data. Simulation results from this mixed model of firm creation and growth did not give better results than the ones shown in fig. 6; the main driver between the diversification of an ecosystem then simply seems to be the allocation of fundings regardless of other ecosystem-dependant factors. The tendency of entrepreneurs to explore new industries or instead follow existing trends thus seems heavily linked to individual decisions which are particularly influenced by how financially successful the existing companies in the various categories have been.
4 Conclusion
In this paper, we presented a novel approach with respect to studying the emergence of startup ecosystems. Using public datasets, we first presented a novel, automated and interactive data visualization tool that facilitates the study of startup populations from an ecosystem point of view, and that also sheds light on the particularities of different ecosystems. Relatedly, diversity metrics such as the Shannon-Wiener index and the Simpson-Herfindahl index were then introduced, fostering the analogy with ecological sciences. We further tried to understand how observed diversity could emerge both by attempting to relate its disparity between ecosystems to macroeconomic indices and through numerical simulation. Our results suggest that the increase in diversity during the growth of a startup ecosystem can be explained through the sequential allocation of funding to startups in given sectors, thanks to a simple preferential attachment model, rather than by macro-economic indicators with the exception of economic development: i.e., startup ecosystem diversity appears as the outcome of emerging and aggregated behaviours rather than linked to ecosystem-specific characteristics or decisions. Needless to say, this analysis of ecosystem diversity remains preliminary and deserves further analysis, not only on a larger sample of ecosystems but also with a focus on events: for instance, linking ”diversification” events, i.e. sectors getting a rather sudden and large influx of new startup creations, to specific ”breakthroughs” – either technological, as was recently the case with deep learning, or business-oriented, as has been seen with respect to Food Delivery – could typically give valuable insights into startup ecosystem diversity and diversification.
Appendix
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 11. Hannan MT, Freeman J. Organizational Ecology. Cambridge, Harvard University Press. 1993.
- 22. Moore JF. Predators and prey: a new ecology of competition. Harvard Business Review. 1993;71(3):75-86.
- 33. Schot J. The usefulness of evolutionary models for explaining innovation: The case of the Netherlands in the nineteenth century. History and Technology. 1998;14(3):173-200.
- 44. Gilbert, Mc Dougall, Audretsch. The Emergence of Entrepreneurship Policy. Small Business Economics. 2004;22(3-4):313-23.
- 55. Minniti M. The Role of Government Policy on Entrepreneurial Activity: Productive, Unproductive, or Destructive?. Entrepreneurship Theory and Practice. 2008;32(5):779-790.
- 66. National Venture Capital Association / Thomson Reuters, Venture Capital Fundraising tops $10 Billion in Q 2, Recording Strongest Quarter Since 2007 [Internet]. 2018 January. Available from: https://nvca.org/pressreleases/venture-capital-fundraising-tops-10-billion-in-q 2-recording-strongest-quarter-since-2007/ .
- 77. Birch D. Job Creation in America: How Our Smallest Companies Put the Most People to Work. University of Illinois at Urbana-Champaign’s Academy for Entrepreneurial Leadership Historical Research Reference in Entrepreneurship. 1987.
- 88. Thurik R, Wennekers S. Linking entrepreneurship and economic growth. Small Business Economics. 1999;13(1):27-56.
