Researcher population pyramids: Tracking demographic and gender trajectories across countries
Kazuki Nakajima, Takayuki Mizuno

TL;DR
This paper introduces a framework to track researcher demographics and gender balance across countries using publication data, revealing different patterns and future scenarios.
Contribution
The novel contribution is a researcher population pyramid framework for analyzing demographic and gender trends in academic systems.
Findings
Emerging systems like Arab countries show high researcher inflows but widening gender productivity gaps.
Mature systems like the US have modest inflows and narrowing gender gaps.
Rigid systems like Japan lag in both demographic and gender balance.
Abstract
The sustainability of the academic ecosystem relies on researcher demographics and gender balance, yet assessing these dynamics in a timely manner for policy is challenging. Here, we propose a researcher population pyramid framework for tracking demographic and gender trajectories across countries using publication data. We provide a timely snapshot of historical and present demographics and gender balance across 58 countries, revealing three contrasting patterns among research systems: Emerging systems (eg Arab countries) exhibit high researcher inflows with widening gender gaps in cumulative productivity; Mature systems (eg the United States) show modest inflows with narrowing gender gaps; and Rigid systems (eg Japan) lag in both. Furthermore, by simulating future scenarios, the framework makes potential trajectories visible. If 2023 demographic patterns persist, Arab countries’…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6| CAGR | Share of women (%) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Country | 2010–2023 | 2023–2030 | 2030–2040 | 2040–2050 | All authors | Senior-career authors | ||||||
| 2023 | 2030 | 2040 | 2050 | 2023 | 2030 | 2040 | 2050 | |||||
| AE |
|
|
|
|
|
|
|
|
|
|
|
|
| DZ |
|
|
|
|
|
|
|
|
|
|
|
|
| EG |
|
|
|
|
|
|
|
|
|
|
|
|
| IQ |
|
|
|
|
|
|
|
|
|
|
|
|
| JO |
|
|
|
|
|
|
|
|
|
|
|
|
| LB |
|
|
|
|
|
|
|
|
|
|
|
|
| MA |
|
|
|
|
|
|
|
|
|
|
|
|
| SA |
|
|
|
|
|
|
|
|
|
|
|
|
| TN |
|
|
|
|
|
|
|
|
|
|
|
|
| AU |
|
|
|
|
|
|
|
|
|
|
|
|
| CA |
|
|
|
|
|
|
|
|
|
|
|
|
| DE |
|
|
|
|
|
|
|
|
|
|
|
|
| ES |
|
|
|
|
|
|
|
|
|
|
|
|
| FR |
|
|
|
|
|
|
|
|
|
|
|
|
| GB |
|
|
|
|
|
|
|
|
|
|
|
|
| IT |
|
|
|
|
|
|
|
|
|
|
|
|
| JP |
|
|
|
|
|
|
|
|
|
|
|
|
| SE |
|
|
|
|
|
|
|
|
|
|
|
|
| US |
|
|
|
|
|
|
|
|
|
|
|
|
- —JSPS10.13039/501100001691
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDemographic Trends and Gender Preferences · scientometrics and bibliometrics research · Diversity and Career in Medicine
Introduction
The pursuit of global sustainability—addressing grand challenges such as climate change, public health crises, and equitable economic development—is inextricably linked to the vitality of the global research enterprise. Sustained innovation, driven by a robust and dynamic academic ecosystem, provides the essential knowledge, technologies, and solutions required to navigate these complex issues (1). However, the role of this ecosystem extends beyond immediate scientific discoveries; it is responsible for the long-term reproduction of research talent and the continuity of knowledge itself (2). A disruption in this human capital pipeline, encompassing both the flow of researchers from early-career training to senior-level expertise and the diversity within that flow, undermines a society’s capacity to address not only current problems but also future, unforeseen challenges (3–5). Ensuring the sustainability of the academic ecosystem is not merely an internal concern for the research community but a fundamental prerequisite for the long-term well-being and resilience of global society as a whole (6).
The sustainability of the academic ecosystem, in turn, is fundamentally determined by the demographic composition of its researcher population (7–9). This composition across career stages—early-career, mid-career, and senior-career researchers—directly influences the quality and quantity of research outputs within countries. Doctoral students and postdoctoral researchers constitute a significant part of the pipeline for future researchers, and individuals’ productivity during postdoctoral periods correlates with subsequent career continuity (10). Importantly, across all career stages, researchers exhibit diverse productivity patterns (11), reflected in considerable variation in the timing of their most impactful research, which can occur at any stage of a research career (12). Mid-career and senior-career researchers also play critical mentoring roles, thus facilitating the effective transfer of scientific expertise to younger generations (13, 14). In addition to ensuring a balanced researcher population across career stages, achieving gender diversity in academia remains a persistent global challenge (15–23), as gender-diverse research environments foster broader research questions, varied perspectives, and enhanced scientific creativity (24, 25). Therefore, developing robust methods for the systematic analysis of researcher demographics and gender balance across countries has become a critical challenge for science policymakers and educational administrators worldwide.
Reconstructing researchers’ publishing careers from bibliographic data has enabled broader international and longitudinal analyses of research systems and research careers, including productivity patterns (11, 12), collaboration patterns (26–29), research impacts (30, 31), structural imbalances (15, 19), and many more (2, 32, 33). Quantifying researcher demographics across career stages, gender, and national contexts from bibliometric data, however, still poses a significant methodological challenge: reliably determining whether researchers’ publishing careers are ongoing in specific years. This challenge stems from the diversity in publication intervals among researchers and technical barriers in identifying researchers’ publications based on bibliographic data (19, 34, 35). This uncertainty, as assessments of total productivity and publishing-career length often rely on defining a career end (19), necessitates a substantial time lag when evaluating researchers’ career characteristics. These delays are particularly problematic for rapidly evolving research systems, where timely insights are essential for informing effective research and educational policies.
Here, we propose a researcher population pyramid framework for mapping and comparing researcher demographics across publishing careers, gender, and national contexts. Based on author and publication data from a large-scale bibliographic database, we deploy country- and gender-specific thresholds of interpublication intervals to identify “active” authors (ie those whose publishing careers are ongoing) in a specific year. Within this framework, we define the cumulative productivity of an active author as the number of their most recent, uninterrupted publications up to a given year. Unlike the total productivity and publishing-career length, cumulative productivity can be measured without waiting for a publishing career to end. We construct population pyramids by counting active authors according to their cumulative productivity in a given year by country and gender. This framework enables systematic international comparisons of researcher demographics and gender balance across time. For future contexts, we simulate potential trajectories based on plausible transition scenarios derived from observed trends. Applying this diagnostic tool to 58 countries, we demonstrate its utility by revealing three contrasting evolutionary patterns—Emerging, Mature, and Rigid—exemplified by Arab countries, the United States, and Japan, respectively.
Results
Trends in researcher populations and gender balance
We analyze 14,745,796 gender-assigned authors, derived from 151,905,632 publication records. These authors are affiliated with 58 countries for which Naive Bayes gender classifiers achieved high accuracy in two benchmarks (see Methods and Supplementary Sections S1 and S2). For comparison, we focus on two country groups. The nine Arab countries are Algeria, Egypt, Iraq, Jordan, Lebanon, Morocco, Saudi Arabia, Tunisia, and the United Arab Emirates. The ten reference countries are Australia, Canada, France, Germany, Italy, Japan, Spain, Sweden, the United Kingdom, and the United States. We selected these countries to ensure a sufficiently large sample of gender-assigned authors (eg at least 9,000 per gender per country) and to represent regional diversity relevant for cross-national comparisons. The comparison is primarily motivated by their contrasting developmental trajectories. In particular, Arab countries experienced significant sociopolitical transformations during and after the Arab Spring (2010–2012) (36, 37). Given these systemic changes and their notable increase in research output (38–42), we hypothesize that Arab countries are undergoing rapid shifts in researcher demographics and gender balance. By contrast, reference countries, with their established research infrastructures and stable funding mechanisms (41), serve as demographic benchmarks for comparison.
To quantify the long-term growth of researcher populations, we counted the number of unique gender-assigned authors who, for each year between 2000 and 2023, published at least one paper with their country of affiliation (Figs. 1a and 1b). These annual author counts served as proxies for the size of national researcher populations. We observed roughly exponential growth in author counts for both country groups, with average annual growth rates (estimated from log-transformed data) higher in Arab countries than in reference countries (see Supplementary Section S3.1). We also calculated the annual proportion of female authors among unique authors for each year (Figs. 1c and 1d). Among Arab countries, Jordan, Saudi Arabia, and Tunisia showed the largest increases in female author proportions, while Morocco exhibited a relatively modest increase (Fig. 1c; see Supplementary Section S3.2). In contrast, reference countries generally displayed more gradual trends in female author proportions (Fig. 1d). Notably, Japan exhibited the lowest growth rate, consistent with persistent gender imbalances in Japanese academia (43).
Trends in researcher population and gender balance from 2000 to 2023. a and b) Annual number of unique authors who published at least one paper, affiliated with a) Arab and b) reference countries. Plotted on a logarithmic scale. c and d) Annual proportion of female authors affiliated with a) Arab and b) reference countries.
Cumulative productivity and population pyramids
To provide a more detailed understanding of researcher demographics, gender balance, publishing careers, and their evolution, we introduce a researcher population pyramid framework. This framework allows for the multidimensional visualization of researcher populations across publishing-career stages and genders. We construct these pyramids by counting active authors (ie those whose publishing careers are ongoing) in a given year by their cumulative productivity (ie the number of their most recent, uninterrupted publications up to that year; see Methods). Unlike existing metrics that rely on a definitive career end, cumulative productivity can be measured without waiting for a publishing career to end. We also simulate a plausible future demographic pattern by extending the 2023 pyramid forward (see Methods); our projections should be interpreted as scenarios based on 2023 trends.
Figure 2 illustrates the divergent demographic trajectories of researcher populations in four contrasting countries: Egypt, Tunisia, Japan, and the United States. To aid interpretation of these pyramids, we heuristically classify cumulative productivity into three descriptive stages: “early-career” (1–10 cumulative publications), “mid-career” (11–50 cumulative publications), and “senior-career” (51 or more cumulative publications). It should be noted that this labeling is for descriptive convenience only and is based solely on the length of an author’s uninterrupted publication sequence, without accounting for research impact or differences across research disciplines.
Researcher population pyramids in 2010, 2023, and 2050 for Egypt, Tunisia, Japan, and the United States. The number of active authors for each gender is displayed on a logarithmic horizontal axis. Total active author counts (N) are provided for each panel. The 2050 pyramids and their corresponding counts (≈) are projections based on 2023 trends.
In 2010, both Egypt and Tunisia exhibited relatively limited numbers of authors across publishing-career stages. Between 2010 and 2023, both countries experienced substantial expansion in their researcher populations, driven primarily by marked increases in early- and mid-career authors. Looking toward 2050, our projection scenario suggests substantial growth for Egypt, with both the number of authors (horizontal expansion) and cumulative productivity levels (vertical growth) continuing to expand. This suggests Egypt exemplifies a research system undergoing rapid demographic expansion. In contrast, Tunisia’s 2050 projection suggests a more moderate overall growth characterized by a slowing inflow of early-career authors and an increasingly stable demographic profile. This pattern indicates Tunisia’s research system is transitioning toward a more mature phase than Egypt’s.
In 2010, Japan’s research population pyramid showed a broad distribution of authors across publishing-career stages, but with a marked underrepresentation of women. This gender imbalance persisted through 2023, coupled with limited inflow at the early-career level. The 2050 projection indicates a continuation of this trend, with the pyramid remaining largely unchanged. These trends collectively depict a demographically rigid system, raising serious concerns about the long-term sustainability of Japan’s research enterprise.
The United States’ 2010 population pyramid showed a well-balanced distribution of researchers across all publishing-career stages, despite a moderate underrepresentation of women. This stable structure was maintained through 2023, with observable improvements in gender imbalance. Projections for 2050 suggest continued structural stability and only modest expansion. This trajectory—reflecting a research system that may be reaching saturation but appears demographically sustainable—presents a stark contrast to the dynamic growth observed in Egypt. Consequently, the United States exemplifies a mature, steady-state research system.
Underlying the divergent shapes of the population pyramids, we found that countries vary greatly in their researcher inflow—defined as the proportion of newly active authors among all active authors affiliated with a country in a given year. For 2023, Arab countries showed substantially higher researcher inflow than reference countries (Fig. 3). Furthermore, the share of women among newly active authors achieves near gender parity (defined here as 45–55% female representation) in most Arab countries and many reference countries. However, notable exceptions exist: Lebanon, Tunisia, and the United Arab Emirates exhibited slightly higher female shares than the defined parity range. In contrast, Japan consistently exhibited the lowest proportion of female newly active authors among all countries considered.
Researcher inflow in 2023. Bars show the proportion of newly active authors in 2023 relative to all active authors in the same year. The bottom and top segments of each bar represent female and male authors, respectively. Percentages above the bars indicate the female share among newly active authors.
To provide a longitudinal perspective on the structural evolution associated with this researcher inflow, we examined the evolution of the aggregate researcher population pyramid, constructed by pooling all active authors affiliated with any of the nine Arab countries, from 2000 to 2023 (Fig. 4). In 2000, the pyramid exhibited a limited number of authors across all cumulative productivity levels. However, over the subsequent two decades, these countries experienced a marked horizontal widening of the early-career base, reflecting the substantial inflow of researchers. By 2023, this sustained inflow had begun to translate into vertical growth, with an increasing number of authors reaching higher levels of cumulative productivity. This structural shift demonstrates how our framework can visualize the transition of the “Emerging” research system.
Researcher population pyramids for the aggregate of the nine Arab countries in 2000, 2005, 2010, 2015, 2020, and 2023.
To contextualize the observed inflow dynamics and future projections for diverse research systems, we examine the long-term historical evolution of research populations in Japan and the United States from 1970 to 2000 (Fig. 5). In 1970, Japan’s pyramid featured a narrow base with very few senior female authors. By 1980, Japan’s pyramid structure closely resembled that of the United States in 1970, suggesting Japan lagged about a decade behind the United States in its demographic development. However, the United States pyramid expanded rapidly thereafter; by the 1990s and 2000s, its shape mirrored that of Japan in 2023 (see also Fig. 2). This comparison implies that Japan now lags more than 20 years behind the United States in its population structure, indicating the development gap has widened. These retrospective pyramids provide a valuable reference point for interpreting the potential future trajectories of today’s rapidly growing research systems, such as those in the Arab countries.
Researcher population pyramids for Japan and the United States in 1970, 1980, 1990, and 2000.
The preceding results lead us to position the 58 countries along two key dimensions: researcher inflow and gender gap in cumulative productivity (Fig. 6). First, all of these countries exhibit a negative gender gap in cumulative productivity (ie the mean for women is lower than for men), which is largely consistent with previous results on gender gaps in total productivity and publishing-career length across countries (19). Second, the trajectories from 2010 to 2023 (gray arrows) suggest three broad, qualitative patterns of demographic evolution. For descriptive purposes, we heuristically label these patterns as follows:
“Emerging”: Characterized by high researcher inflow and a widening gender gap in cumulative productivity (eg most Arab countries; arrows moving to the left).“Mature”: Characterized by lower researcher inflow but a narrowing gender gap (eg the United States; arrows moving down and to the right).“Rigid”: Characterized by low researcher inflow and a large, persistent gender gap (eg Japan; minimal arrow movement).
It is important to note that these labels are intended as a descriptive heuristic to facilitate interpretation. This qualitative classification is based on the observed positions and trajectories of countries within the specific 2D space defined by the two metrics. We also note that while most of the Arab countries analyzed exhibited a decrease in their researcher inflow from 2010 to 2023 (indicated by the downward direction of the gray arrows in Fig. 6), they consistently maintained researcher inflow above the global median (the horizontal dashed line). This trend suggests that while the initial rapid growth of their researcher populations has begun to stabilize as the total number of active authors increases, these systems continue to exhibit a substantially higher proportion of newly active authors relative to mature or rigid systems.
Scatter plot of the Arab, reference, and other countries positioned by researcher inflow (vertical axis) and the gender gap in cumulative productivity (horizontal axis) in 2023. Researcher inflow is the proportion of newly active authors among all active authors. The gender gap in cumulative productivity is defined as the mean cumulative productivity of female authors minus that of male authors, normalized by the male mean. Country codes are as follows: AE (United Arab Emirates), AU (Australia), CA (Canada), DE (Germany), DZ (Algeria), EG (Egypt), ES (Spain), FR (France), GB (United Kingdom), IQ (Iraq), IT (Italy), JO (Jordan), JP (Japan), LB (Lebanon), MA (Morocco), SA (Saudi Arabia), SE (Sweden), TN (Tunisia), and US (United States). Gray arrows indicate the trajectories of the Arab and reference countries from their coordinates in 2010 to their coordinates in 2023 in the same 2D space. Vertical and horizontal dashed lines indicate the respective medians across the 58 countries. See Supplementary Section S6 for detailed results for all 58 countries.
To assess the long-term sustainability of the research systems in these countries, we project their researcher population size and gender balance forward to 2050 based on their researcher population pyramids. For population growth, we computed the compound annual growth rate (CAGR) of the number of active authors over four successive periods: 2010–2023, 2023–2030, 2030–2040, and 2040–2050 (see Supplementary Section S7 for CAGR definition). For gender balance, we examined the proportion of female authors among all active authors and among senior-career authors in 2023, 2030, 2040, and 2050.
Table 1 reports empirical and projected CAGRs for active authors and the projected proportion of female authors across the selected countries. We observed a universal trend of projected decline in growth rates over time; this dynamic reflects our framework’s assumption of stabilized numbers of newly active authors. Arab countries exhibit greatly higher growth rates than reference countries for all the periods. In contrast, Japan and Sweden even approached negative growth. Projections indicate divergent trajectories in the proportion of female authors among all authors and senior-career authors by 2050 across these countries. The Arab countries, except for Jordan and Morocco, are projected to reach near gender parity (defined here as 45–55% female representation) among all authors by 2050; Tunisia is even expected to slightly exceed this range. However, the proportion of female authors among senior-career authors is projected to follow diverse trajectories across these Arab countries. By contrast, reference countries show relatively stable progress; most are expected to reach near gender parity at both overall and senior levels by 2050, except for Germany and Japan.
Discussion
We examined researcher demographics across 58 countries through the lens of population pyramids, observing three broad patterns among research systems. We heuristically label these as follows: (i) “Emerging,” exemplified by current Arab countries, which exhibit large inflows of newly active authors alongside widening gender gaps in cumulative productivity; (ii) “Mature,” such as the United States, characterized by moderate inflows and gradually narrowing gender gaps; and (iii) “Rigid,” such as Japan, which lag in both researcher inflows and progress on gender equality.
Our analysis shows that Arab countries exhibit notably higher researcher inflows than the reference countries. This pattern is partially consistent with previous reports on their expanding research outputs (38–42), increased scholarly attention (44), and enhanced funding investment (45). This rapid growth is likely driven by ambitious national strategies, such as Saudi Arabia’s “Vision 2030” (46), aiming to transition these countries toward knowledge-based economies. The sustainability of this inflow hinges on future trends in higher education. UNESCO data show that the proportion of individuals graduating from higher education in Arab countries (25.7% in 2023 (47)) remains lower than that in Organisation for Economic Co-operation and Development (OECD) states (41.8% in 2023 (47)), suggesting substantial room for further expansion. Consequently, with continued policy investment in research infrastructure and the training of early-career researchers, this inflow of new talent can continue or even accelerate in Arab countries. Nevertheless, this quantitative expansion also entails critical challenges. The widening gender gap in cumulative productivity identified in several Arab countries by our analysis is a prominent example. Ensuring the sustainability of this growth, therefore, requires policymakers to look beyond sheer numbers and address systemic factors, including the quality of education and the creation of robust supports for research careers.
Gender gaps in cumulative productivity were evident across most countries examined, yet the magnitude of these gaps varied substantially. Among Arab countries, Egypt, Iraq, Tunisia, and the United Arab Emirates stand out for the rapid increase in the proportion of female authors since the early 2000s (see Fig. 1). However, our analysis reveals that Tunisia and the United Arab Emirates exhibit larger gender gaps in cumulative productivity than Egypt and Iraq (see Fig. 6), which may be associated with greater attrition of women from academic careers at senior-career stages (48, 49). This suggests that targeted research policies will be increasingly important for addressing these gender gaps in cumulative productivity in these countries. Such policies include correcting gender imbalances in resource allocation (50), implementing career support measures that accommodate childcare and family responsibilities (49, 51), and actively promoting female researchers as role models (52). Among reference countries, Germany and Japan lag behind in both the long-term increase in the share of women and the narrowing of gender gaps in cumulative productivity. In Japan, these findings are consistent with the persistence of gender inequality across many societal domains (53) and with a recent study highlighting the structural persistence of gender imbalance in Japanese academia (43). In Germany, despite ranking 6th in the 2023 Gender Gap Index (54), our findings align with the relatively low share of full-time equivalent female researchers (41) and gender imbalance in professional promotions in the same country (55). These results demonstrate that while national trajectories in gender inclusion vary widely, structural barriers to research career advancement for women persist even in countries with high societal aggregate gender equality.
Interpreting our projections within the assumption of stationarity in demographic transitions after 2023, Arab countries are likely to face a demographic turning point as researchers age and begin to retire. Supporting this view, the growth rates projected for Arab countries in the near future (2030–2040) mirror those observed in the reference countries’ recent past (2010–2023; see Table 1), which suggests that Arab countries may reach this demographic inflection point by 2050. Projections of gender balance further illuminate these diverging trajectories (Table 1). Egypt, for instance, is on a trajectory toward gender parity, indicating convergence with a mature system. In contrast, Algeria, Jordan, and Morocco are expected to maintain persistent gender gaps among senior-career authors, suggesting a trajectory more aligned with rigid systems. Saudi Arabia and the United Arab Emirates show signs of gradual maturation but with slower progress toward gender parity among senior-career authors.
Our analysis relies on the author disambiguation implemented in the OpenAlex database. We acknowledge the potential limitations of this method in accurately reconstructing publication records, particularly for authors from countries with high surname homogeneity and diverse transliteration practices (56–58). Enhancing the reliability of our framework requires future work on two fronts. First, it warrants systematic efforts to benchmark, evaluate, and improve the disambiguation accuracy for author names from various national contexts across different bibliographic databases. Second, while our approach makes a significant addition to previous work (19) by introducing country- and gender-specific thresholds for interpublication intervals of authors, this could be further refined by incorporating discipline- or career-stage-specific thresholds. Such methodological refinements would open further avenues for research into the dynamics of academic talent pipelines, ultimately contributing to a more equitable and sustainable academic ecosystem.
Regarding the validity of using cumulative productivity as a proxy for career stages, we note that this metric is a functional proxy based on researchers’ uninterrupted publication sequences rather than total productivity or publishing-career length. While total productivity and publishing-career length serve as measures of academic careers (19, 34, 35), they often require identifying a definitive career end and thus entail a substantial time lag when evaluating researchers’ career characteristics (19). We employed cumulative productivity in our framework because it provides a timely snapshot of the active publishing workforce, serving as a data-driven diagnostic tool for monitoring research ecosystems.
While we heuristically labeled the bins of cumulative productivity as “early-career” (1–10 cumulative publications), “mid-career” (11–50), and “senior-career” (51 or more) to facilitate international comparisons, this classification may introduce potential biases in assessing researcher demographics and gender balance. First, despite the heterogeneity of publishing-career trajectories (11), cumulative productivity may conflate current publishing-activity levels with “academic age” (eg years since the first publication (59)). This, for instance, might categorize publishing-active researchers with a short academic age (eg those with 30 publications within 5 years of their first publication) as “mid-career,” potentially overestimating their academic careers. Second, publication practices vary significantly across disciplines. For example, average total productivity in a research career in biology (Male: 16.56, Female: 10.31) was substantially higher than in mathematics (Male: 7.13, Female: 5.55) in a previous analysis (19). Our pyramids might reflect a complex interplay between national disciplinary composition and the underlying demographic distribution of publishing momentum. Nevertheless, our additional analysis across four major research domains reveals that the qualitative characteristics of the Emerging, Mature, and Rigid research systems remain largely consistent within Egypt, Tunisia, Japan, and the United States (see Supplementary Section S8 for details). Third, the cumulative productivity metric is sensitive to career interruptions; a career break (eg due to parenthood (51)) that exceeds the gender- and country-specific threshold resets the count to one in our framework. Although we defined these thresholds to accommodate diverse publication paces across countries and genders, the metric still prioritizes recent and continuous publishing momentum, potentially misclassifying experienced researchers with nonlinear career trajectories as “early-career.”
We acknowledge that these methodological choices and their potential biases might shape the interpretation of our results. First, the observed gender gap in the “senior-career” stage may reflect a disparity in publishing continuity. In “Emerging” systems, widening gaps in cumulative productivity may signify a “publishing-continuity gap,” where female researchers—who disproportionately face career interruptions (51)—could be assigned to lower cumulative-productivity bins. Consequently, our findings should be interpreted as highlighting the structural challenges women face in maintaining uninterrupted publication momentum. Second, the classification of research systems into “Emerging,” “Mature,” and “Rigid” systems reflects the aggregate dynamics of active publication momentum. For instance, a “Rigid” system like Japan’s likely reflects not only low researcher inflow but also structural barriers or disciplinary compositions that prevent mid- and senior-career researchers from sustaining high publication momentum.
Our approach has additional limitations. First, the geographical scope of our analysis is constrained by the performance of name-based gender inference. We had to exclude several major research-producing countries, including China, India, and South Korea, where the classifier’s accuracy was insufficient. This insufficient accuracy result is partially consistent with recent work demonstrating that the error rates of name-based gender inference are not uniformly distributed (60). Second, even among the countries included in our analysis, the gender assignment rate (ie the proportion of unique authors for whom a binary gender were inferred with high confidence) varies considerably (see Table S7). For some countries, the rate is below 50%; notably, for Iraq, a key country in our analysis of the Arab countries, the rate is 45.8%. While we retained countries with lower rates in our analysis to maintain broad geographical coverage, their specific results should be interpreted carefully, as the subset of gender-assigned authors may not be representative of the entire researcher population in those countries. Third, while we focus on the longitudinal visualization and classification of research systems in this study, our framework may be useful for quantitatively investigating the causal impact of specific sociopolitical events or policy interventions on researcher demographics. Future work could use this diagnostic tool to assess whether historical events (eg the Arab Spring in Arab countries (38, 44)) reshaped the generational profiles of academic systems or whether these systems remained resilient to such events. Finally, our projections to 2050 are based on a strong stationarity assumption that the inflow in 2023 and transition patterns observed from 2022 to 2023 will remain constant in the future. More sophisticated forecasting models that incorporate uncertainty, scenario variations (eg changes in researcher inflow), and structural breaks warrant future work.
Despite these limitations, our work highlights that understanding and developing the human capital of research has never been more critical in an era of global challenges. Our framework provides not just a retrospective snapshot but a dynamic diagnostic tool for policymakers to proactively shape more sustainable and equitable academic futures.
Methods
Publication data
Our analysis is based on OpenAlex, a large-scale, open bibliographic database containing hundreds of millions of publication records with extensive metadata across multiple disciplines (61). We used its 2024 September 27 snapshot, from which we extracted 151,905,632 publications classified as “articles” and published between 1950 and 2023. While our analysis focuses on this specific period (hereafter, and ), our framework is general and can be applied to any year range for which publication data are available. For each paper, we extracted its publication date and, for each author of the paper, their ID, full name, and affiliations. The author IDs are assigned by OpenAlex’s proprietary author-name disambiguation algorithm.
We assigned a primary country (or countries) of affiliation to each author u as follows. For each author u, we first compiled a list of all country codes (ISO 3166-1 alpha-2) from their affiliations. We then computed the frequency distribution of these countries and assigned the country (or countries) with the highest frequency as the primary affiliation(s) for u. If multiple countries tied for the highest frequency, we assigned all of them to u. We excluded any author for whom no country was assigned. Throughout our manuscript, for a given country c, we refer to an author whose assigned countries include c as an “author affiliated with country c.”
Gender assignment to authors
Bibliometric studies involving gender comparisons often require the assignment of gender to authors based on their first names (eg (15, 19, 20)). Previous studies have shown that incorporating an author’s country of affiliation improves the accuracy of such name-based gender inference (19, 43). Based on this, we trained the Complement Naive Bayes (CNB) classifier (62, 63) that predicts the most likely binary gender and its associated probability, given an author’s first name and country of affiliation. We evaluated the classifier alongside an existing classifier (64) using two large-scale, cross-country benchmarks: one derived from the Orbis executive database (65), and another from the world gender name dictionary (WGND) (66, 67) (see Supplementary Section S1 for details). For each of the 61 countries where the CNB classifier achieved sufficiently high accuracy, defined as an area under the receiver operating characteristic curve (AUC) of at least 0.8, on both benchmarks, we determined a country-specific confidence threshold for gender assignment (see Table S3 for details). For this existing classifier (64), we also tuned a country-specific confidence threshold for gender assignment across the same countries (see Table S4 for details). We excluded Bangladesh, China, India, and South Korea from our analysis because their AUC values fell below 0.8 in the WGND benchmark.
We thus began with the pool of 20,509,487 unique authors affiliated with one of the 61 countries. We first excluded 413,549 authors (2.0% of the pool) for whom a single primary country could not be uniquely determined (ie multiple countries tied for the highest frequency). For each remaining author u affiliated with country c, we extracted the first space-separated token from their full name (assumed to be the first name), and input it along with country c into the CNB classifier to obtain an inferred gender g and its associated probability . We assigned gender g to u if ; otherwise, we left the gender unassigned. Finally, we excluded three countries (Lesotho, Papua New Guinea, and Somalia) from our analysis due to the small number of gender-assigned authors. This process resulted in 14,745,796 gender-assigned authors affiliated with 58 countries. Table S7 reports the number of gender-assigned authors by country of affiliation.
We acknowledge that our classification is based on a binary conception of gender (female or male) inferred from first names. This approach, while common in large-scale bibliometric analyses (15, 19), cannot account for individuals who identify as nonbinary, transgender, or otherwise gender diverse, and we recognize that it is an imperfect proxy for gender identity. Despite this limitation, we employ this approach as it remains one of the few feasible approaches for approximating gender demographics across countries.
Interpublication interval threshold
Analyses of authors’ publication records offer valuable insights into research career dynamics (12, 19, 68–70). Key existing metrics, “total productivity,” defined as the total number of publications in an entire career (19), and “publishing-career length,” defined by the duration between an author’s first and last publications (19, 34, 35), rely on a definitive “last” publication. For instance, Huang et al. (19) used a simple global rule: an author was considered to have ended their career by 2010 if no publications appeared after 2010 in records spanning 1900 to 2016. While easy to implement, this cutoff excluded all authors who continued publishing beyond 2010.
We refine this approach by classifying each author as “active” or “inactive” in a given year, based on the survival function of interpublication intervals. These intervals—the time between consecutive publications—are measured in years by dividing the number of days by 365 and are estimated separately for each country and gender. For each author who has published at least two papers, we extract their publication dates in chronological order to compute this sequence of intervals. For each country c and gender g, we pool all such intervals and estimate the corresponding survival function. We then determine a threshold, , defined as the largest interpublication interval (in years) at which the survival probability remains above 2%.
This 2% threshold was chosen after our preliminary analysis (see Table S10), as we determined this value strikes a good balance between maximizing the inclusion of authors with diverse publication paces and ensuring the threshold duration remains plausible. A 1% threshold, for instance, yielded excessively long interpublication intervals (eg 15 years or longer) for several countries such as Jamaica. Conversely, a 5% threshold resulted in intervals of 5 years or less for most countries, which could prematurely classify authors as inactive in disciplines with slower publication cycles.
Using this threshold, we define an author as “active” in a given year t if they published at least one paper between years and t. We consider authors who do not meet this criterion in year t to be “inactive.”
Construction of population pyramids
For a given country c, binary gender g, and year , we identify authors who satisfy a set of criteria extended from the methodology of Huang et al. (19). Specifically, we require that authors meet four conditions: (i) they have published at least one paper by year t; (ii) the duration between their first publication and the end of year t does not exceed 40 years; (iii) their total number of publications by year t is at most 500, a threshold corresponding to approximately the top 0.1% of authors in our data; and (iv) they are classified as “active” in year t, based on the survival function of interpublication intervals described in the “Interpublication interval threshold” section.
To construct the population pyramid for a given country and year, we calculate the cumulative productivity as of that year for each active author affiliated with that country. This number, k, measures the continuity of an author’s recent research activity, defined as the length of the author’s most recent, uninterrupted sequence of publications. We determine k by chronologically scanning an author’s entire publication record. We maintain a counter for the current continuous publication sequence. Starting with the first publication, this counter increases by one for each subsequent publication, as long as the time interval (in years) does not exceed the specified threshold ( ). If an interval exceeds the threshold, the consecutive publication count (k) restarts at 1 with the next publication. It is important to clarify that this procedure is not intended to suggest that a researcher’s prior work becomes irrelevant after a prolonged publication interval. Rather, it serves as a practical strategy to mitigate potential errors in author disambiguation within the OpenAlex database. Since author disambiguation remains a technically challenging task (58, 71, 72), publication records belonging to different individuals may occasionally be conflated. By treating sufficiently long intervals in publication as breaks in the publication record, we aim to minimize the impact of such errors on our population pyramid estimates. The final value of the counter after scanning all publications represents the author’s cumulative productivity, k. We provide an analysis of cumulative productivity including its correlation with total productivity and publishing-career length in Supplementary Section S5.
To project the population pyramid for country c in years beyond , we propagate the population pyramid at year using the transition observed between year and year . We denote by the end of the projection horizon beyond the observed period ; we set . Let denote the number of authors of gender g affiliated with country c whose cumulative productivity is k at year t. For any , the projected population with cumulative productivity evolves according to
where
is the number of “newly active authors” at year t. These are authors who were inactive at year but become active at year t with a cumulative productivity of k. is the probability that an author with a cumulative productivity of j in year moves to a cumulative productivity of k in year t.
We obtain directly from the observed data. Although and are, in principle, time-dependent, we assume they remain fixed at their most recently observed values (namely, and ) for every . Under this stationarity assumption, Eq. 1 becomes
for all . We obtain every term on the right-hand side from the historical publication record. Specifically, we count for every , and we recursively compute for all and . We estimate the transition probability as the fraction of authors with a cumulative productivity of j in who moved to a cumulative productivity of k in . We use these empirical quantities in Eq. 2 and iterate forward to obtain projected population pyramids for years beyond . We emphasize that this analysis is based on a strong stationarity assumption regarding the inflow in and transition patterns observed from to . This deterministic projection serves as a baseline scenario to illustrate potential long-term demographic trajectories if current conditions were to persist. This assumption is unlikely to hold over multidecade horizons, as it does not account for cohort aging, policy shifts, or external shocks. The resulting projections should therefore be interpreted as stylized illustrations of current momentum.
Supplementary Material
pgag059_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Sachs JD, et al 2019. Six transformations to achieve the sustainable development goals. Nat Sustain. 2(9):805–814.
- 2Fortunato S, et al 2018. Science of science. Science. 359(6379):eaao 0185.29496846 10.1126/science.aao 0185 PMC 5949209 · doi ↗ · pubmed ↗
- 3Hong L, Page SE. 2004. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc Natl Acad Sci U S A. 101(46):16385–16389.15534225 10.1073/pnas.0403723101 PMC 528939 · doi ↗ · pubmed ↗
- 4Beninson L, Daniels R. The next generation of biomedical and behavioral sciences researchers: breaking through. National Academies Press, Washington, D.C., U.S., 2018.
- 5Hofstra B, et al 2020. The diversity–innovation paradox in science. Proc Natl Acad Sci U S A. 117(17):9284–9291.32291335 10.1073/pnas.1915378117 PMC 7196824 · doi ↗ · pubmed ↗
- 6Organisation for Economic Co-operation and Development . 2021. OECD Science, Technology and Innovation Outlook 2021. https://www.oecd.org/en/publications/2021/01/oecd-science-technology-and-innovation-outlook-2021_3f 424d 14.html. Accessed 2025 May.
- 7Gibbons M, Limoges C, Scott P, Schwartzman S, Nowotny H. The new production of knowledge: the dynamics of science and research in contemporary societies. SAGE Publications Ltd, London, UK, 1994.
- 8Powell WW, Snellman K. 2004. The knowledge economy. Annu Rev Sociol. 30(1):199–220.
