Ex-ante measure of patent quality reveals intrinsic fitness for citation-network growth
K. W. Higham, M. Governale, A. B. Jaffe, U. Z\"ulicke

TL;DR
This paper introduces an ex-ante patent fitness measure based on invention attributes at grant time, which predicts citation network growth by combining preferential attachment and aging effects, enhancing understanding of complex network dynamics.
Contribution
It presents a novel ex-ante fitness parameter for patents that predicts citation growth, integrating it with network growth mechanisms without relying on patent-specific features.
Findings
The fitness parameter effectively predicts citation network growth.
Citation network growth is driven by both preferential attachment and aging.
The study bridges fit-gets-richer and rich-gets-richer paradigms.
Abstract
We have constructed a fitness parameter, characterizing the intrinsic attractiveness for patents to be cited, from attributes of the associated inventions known at the time a patent is granted. This exogenously obtained fitness is shown to determine the temporal growth of the citation network in conjunction with mechanisms of preferential attachment and obsolescence-induced ageing that operate without reference to characteristics of individual patents. Our study opens a window on understanding quantitatively the interplay of the rich-gets-richer and fit-gets-richer paradigms that have been suggested to govern the growth dynamics of real-world complex networks.
| backward citations to patents | independent claims |
|---|---|
| (BPA) | (CIN) |
| backward self-citations (BSC) | dependent claims (CDE) |
| backward citations to foreign | inventor team size |
| patents (BFP) | (INV) |
| backward citations to non-patent | class membership |
| literature (BNP) | (NCL) |
| backward-citations’ pedigree not | average age of backward |
| (BCP) | citations (BAG) |
| originality not (ORI) | grant lag (LAG) |
| number of figures (FIG) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Present address: ]College of Management of Technology, EPFL, Odyssea, Station 5, 1015 Lausanne, Switzerland
Ex-ante measure of patent quality reveals
intrinsic fitness for citation-network growth
K. W. Higham
[
Te Pūnaha Matatini, School of Chemical and Physical Sciences, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
M. Governale
Te Pūnaha Matatini, School of Chemical and Physical Sciences, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
A. B. Jaffe
Te Pūnaha Matatini, Motu Economic and Public Policy Research, PO Box 24390, Wellington 6142, New Zealand
MIT Sloan School of Management, 100 Main Street, Cambridge, MA 02142
Brandeis University, 415 South Street, Waltham, MA 02453
QUT Business School, Queensland University of Technology, Brisbane, QLD 4001, Australia
U. Zülicke
Te Pūnaha Matatini, School of Chemical and Physical Sciences, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
Abstract
We have constructed a fitness parameter, characterizing the intrinsic attractiveness for patents to be cited, from attributes of the associated inventions known at the time a patent is granted. This exogenously obtained fitness is shown to determine the temporal growth of the citation network in conjunction with mechanisms of preferential attachment and obsolescence-induced ageing that operate without reference to characteristics of individual patents. Our study opens a window on understanding quantitatively the interplay of the rich-gets-richer and fit-gets-richer paradigms that have been suggested to govern the growth dynamics of real-world complex networks.
A wide variety of social and economic processes evolve such that success or popularity appear to be self-reinforcing. Significant attention has been given to trying to distinguish the extent to which the apparently self-reinforcing behavior of popularity is purely a property of the dynamic system, versus being generated by intrinsic heterogeneity that allows inherently better agents or products to persistently succeed D’Souza et al. (2007); Kong et al. (2008); Papadopoulos et al. (2012); Wang et al. (2013); van de Rijt et al. (2014); Sinatra et al. (2016); Zeng et al. (2017); Fortunato et al. (2018); Mariani et al. (2018). With our increasingly data-rich world enabling more effective ways for measuring quality as well as popularity, this question can now be explored more deeply and in a greater variety of fields Vespignani (2012). Here we provide an answer within the context of technological innovation, where an accepted measure of popularity is the intensity with which patents accumulate citations Jaffe and de Rassenfosse (2017). Our construction of a technology-dependent single quality score for individual patents from a broad range of patent-quality measures that are exogenous to citations and available at the time of grant is shown to quantify the innate attractiveness of patents to be cited in the future. The ability to account for inherent quality as a driver of citation dynamics enables better observation of other important influences, including the time scale for knowledge obsolescence Higham et al. (2017a).
Empirically, the average rate at which the number of citations accrued by patents Hall et al. (2002) increases over time is observed Csárdi et al. (2007); Valverde et al. (2007); Higham et al. (2017a) to follow an aging-tempered Dorogovtsev and Mendes (2000) preferential-attachment-type Price (1976); Barabási and Albert (1999); Krapivsky and Redner (2001) growth model . The asymptotic form for large with embodies a rich-gets-richer feedback loop whereby more highly cited patents are more likely to gain future citations. Similar behavior is exhibited by the citation dynamics of scientific articles, e.g., those published in the journals of the American Physical Society Redner (2005); Golosovsky and Solomon (2012); Higham et al. (2017b). However, the special purpose and associated legal ramifications of citations in patents Hall et al. (2002); Jaffe and de Rassenfosse (2017) enforce a greater degree of caution in citing behavior than is commonly practiced for scientific articles, making patent citations particularly suitable for investigating the relationship between popularity and quality 111Qualitative Meyer (2000) and quantitative Clough et al. (2015) studies have shown that patent citations are less likely to be irrelevant or superfluous. Also, unlike scientific articles, patent publications contain a highly regulated set of metadata suitable for constructing universal intrinsic-quality indicators..
On a phenomenological level, purely preferential-attachment-based models seem to be able to successfully describe the dynamics of how patents receive forward citations. However, they are at odds with the general expectation Marco (2007) that patents are intrinsically heterogeneous in quality, and that citing behavior will, at least in part, be influenced by this intrinsic heterogeneity across the patent population. In theoretical models of network growth, node heterogeneity has been introduced by a fitness, or attractiveness, variable with distribution so that an individual node having links to other nodes at time after its creation gains new links with a rate Bianconi and Barabási (2001)
[TABLE]
In principle, a fitness variable 222Without loss of generality, we assume fitness to be normalized such that its mean satisfies . can represent any quantifiable heterogeneous property, or a number of such properties, exhibited by individual nodes Ferretti et al. (2012), and it can even be designed to depend on the properties of the linking node Papadopoulos et al. (2012); Ferretti et al. (2012). The interplay between fitness and preferential attachment has been studied in detail theoretically Papadopoulos et al. (2012); Golosovsky (2018); Pham et al. (2016), and statistical analyses have been applied to estimate fitness endogenously in real-world networks Newman and Leicht (2007); Kong et al. (2008); Wang et al. (2013); Pham et al. (2016); Ronda-Pupo and Pham (2018). Our current work goes an important step further by determining fitness for individual patents in terms of citation-independent quality measures. Obtaining fitness deterministically and exogenously enables us to conclusively separate its effects from those due to preferential attachment and obsolescence-induced ageing 333A rate of the form given in Eq. (1) covers the extreme cases of growth by pure preferential attachment Barabási and Albert (1999) [by letting , where is the Dirac- function] or pure fitness Caldarelli et al. (2002); Garlaschelli and Loffredo (2004); Tacchella et al. (2012) [].. These advances pave the way for broader studies of knowledge diffusion and could inform the design of meaningful impact measures for technological innovation.
How to determine the quality and/or value of innovations from observable attributes of the associated patents is a difficult question that has been studied extensively Lanjouw and Schankerman (2004); Gambardella et al. (2008); van Zeebroeck and van Pottelsberghe de la Potterie (2011); Kogan et al. (2017); de Rassenfosse and Jaffe (2018). As there are various potential dimensions of usefulness for an invention (including, but not limited to, technological, economic, and strategic/legal), the number of suggested plausible quality-indicator variables has proliferated. We base our construction of a single intrinsic quality score for patents on the variables given in Table 1. Their values for a particular patent are determined by the time of grant and become part of its retrievable official record. These quality measures are also available for all patents, not just a subset such as those assigned to publicly traded companies Kogan et al. (2017). However, before we can use the values to estimate the intrinsic fitness for a patent to attract citations via Eq. (1), three major issues need to be addressed: (i) minimizing cross-correlations (the variables are not necessarily mutually independent quality indicators), (ii) determining relative weighting (multiple uncorrelated quality measures will differ in the magnitude of their influence on citation growth), and (iii) achieving distributional fidelity (the model for fitness constructed in terms of the variables must reproduce salient properties of the empirical distribution of fitness). We now describe in turn how each of these challenges is addressed.
(i) Minimizing cross-correlations. As it is not a priori clear which type of quality is measured by each individual variable and/or how much overlap exists between the quality measures provided by different variables, an exploratory factor analysis Mulaik (2010) is performed not . This process yields uncorrelated factors that are linear functions of the normalized variables 444We use and to indicate the mean and variance, respectively, of any randomly distributed quantity .
[TABLE]
The definition of according to Eq. (2) is motivated by the observation that, with the exception of the normally distributed originality, the raw indicator variables are approximately log-normally distributed over the patent cohorts considered in this work. It furthermore ensures that the have zero mean and unit variance, thus eliminating the arbitrary units used to measure each . The factors also have zero mean and unit variance, and their values for individual patents are determined via the matrix of loadings that is obtained from the factor analysis: . Figure 1(a) illustrates the factors and their loadings obtained for the cohort of patents granted by the United States Patent and Trademark Office (USPTO) between 1 January 1999 and 31 December 2001 and classified under Section A (Human Necessities) of the Cooperative Patent Classification (CPC) system. We have repeated this analysis for another six cohorts of USPTO patents that are granted within the same period but classified under different CPC Sections: B (Performing Operations; Transporting), C (Chemistry; Metallurgy), F (Mechanical Engineering; Lighting etc.), G (Physics), H (Electricity), and Y (General), respectively not .
(ii) Determining relative weighting. The factors represent the minimally correlated variances of the ad-hoc-defined raw quality-indicator variables across a given cohort of patents. They therefore approximate the truly independent dimensions within which quality of individual patents can be measured by observables available at the time of grant. To estimate the relative importance of each factor in determining citation intensity, we perform a forward-stepwise ordinary-least-squares regression on a small training dataset with the Bayes Information Criterion Schwarz (1978) as our test statistic and the log of the (mean-scaled) citations at grant as the dependent variable 555We follow the usual convention where the time of citation is taken to be the application date of the citing patent Hall et al. (2002), whereas the age of cited patents is counted from the time of grant Mehta et al. (2010). Therefore, patents have often already accrued citations by the time they are granted.. Patents in the training dataset are randomly selected and comprise about 10% of a cohort, i.e., 8,000-10,000 patents. The regression coefficients yield weights measuring the variances in the training dataset associated with the factors . Figure 1(b) shows the obtained for factors associated with the cohort of USPTO patents from CPC Section A granted during 1999-2001 not .
(iii) Achieving distributional fidelity. By construction, the linear combination of factors with weights as coefficients is a normally distributed quantity having zero mean and variance given by . Motivated by empirical observations of citation rates Wang et al. (2013), we assume the distribution of fitnesses to be log-normal. For conceptual simplicity, we fix the mean value of the fitness, , which implies . Thus the only free parameter characterizing the log-normal distribution of fitnesses is the standard deviation . It turns out to be possible to extract from the observed dynamics of how patents acquire their first citation without needing to assume anything explicit about how the citation rate Eq. (1) depends on and . More specifically, we consider the time evolution of the fraction of uncited patents. Expanding its expression Higham et al. (2017a); Golosovsky (2018)
[TABLE]
in the short-time limit where , and assuming to be log-normal, yields
[TABLE]
Fitting Eq. (4) to the data enables us to extract for each of the patent cohorts considered here not .
Having addressed the three issues (i)–(iii) discussed above, we estimate the intrinsic fitness of patent to attract citations in terms of the observable values of the raw quality-indicator variables from Table 1 via
[TABLE]
Assuming the rate at which citations are accrued by patents to follow the expression (1), we analyze the empirically observed citation rates for USPTO patents from individual CPC Sections A, B, C, F, G, H, and Y granted during 1999-2001. We allow for a ten-year time window for citation accrual, starting from each individual patent’s time of grant. To account for citation inflation due to structural causes such as fluctuations in R&D spending or other policy decisions Kortum and Lerner (1999), the value of an incoming citation at time is scaled by the number of patents applied for at that time Higham et al. (2017a, b). The change of inflation-adjusted citations received by patent over the time interval is then divided by the fitness estimated for that patent in terms of its attributes according to Eq. (5). We find that the data for (corresponding to the fitness-controlled citation rate ) are best fitted in terms of the product of an ageing function and a preferential-attachment kernel given by
[TABLE]
Figures 2(a) and 2(b) show results obtained for Section-A patents. For comparison, we include in the same figures also fits of the quantity representing the uncontrolled-for-fitness average citation rate considered before Csárdi et al. (2007); Valverde et al. (2007); Higham et al. (2017a). In agreement with previous results Higham et al. (2017a) obtained using the technology-classification scheme from Ref. Hall et al. (2002), we observe to be exponential in the long run while significantly exceeding the exponential behavior extrapolated to short times. Fits to the data for also establish the form Higham et al. (2017a) , with .
Extracting the parameter corresponding to the time scale for obsolescence from fits of the functional form given in Eq. (6a) to the fitness-controlled citation rate for patents having a fixed number of citations yields results as shown for Section-A patents in Fig. 2(c). The values fluctuate around a mean value that is larger than that extracted from similar fits to the uncontrolled-for-fitness citation rate. In contrast, as seen in Fig. 2(d), the exponent in the preferential-attachment kernel Eq. (6b) obtained from fitting this expression to the fitness-controlled citation rate for patents at fixed age (counted from the time of grant) shows an increasing trend at short times but eventually saturates. Figure 3 illustrates the obsolescence times and preferential-attachment exponents extracted from citation data for the patent cohorts considered in this work.
Comparing the dynamics of ageing and preferential attachment exhibited by the fitness-controlled and uncontrolled-for-fitness citation rates, respectively, reveals a number of interesting features. See Fig. 2(a) for an illustration. Firstly, the exponential time dependence from Eq. (6a) generally describes ageing for the fitness-controlled citation rate very well at all times. In contrast, the uncontrolled-for-fitness average citation rate at short times systematically shows a large excess over the extrapolated exponential behavior observed in the long-time limit Higham et al. (2017a); Candia et al. (2019). Controlling for citation inflation as well as fitness has thus enabled us to reveal more clearly the purely obsolescence-induced ageing process governing patent-citation dynamics. (Deviations from the described typical ageing behavior are observed sporadically not .) Generally, turns out to be larger by – years than the time scale extracted from the exponential ageing displayed by the uncontrolled-for-fitness citation rate in the long-time limit (see Fig. 3).
Another striking feature of the fitness-controlled citation rate is the significantly reduced exponent characterizing preferential attachment, especially at early stages of the citation process but persisting also in the long-time limit. Theoretical studies Pham et al. (2016); Golosovsky (2018) have suggested that purely fitness-driven growth can cause phenomenologically observed preferential-attachment dynamics. As our estimate of fitness pertains to the attributes of patents known at the time of grant, it can be expected that the mechanism for attracting citations at that time is largely reflective of this fitness. We indeed find the exponent extracted from citations immediately after time of grant to be much smaller than when fitness is not controlled for, suggesting that a portion of the observed preferential attachment is due to heterogeneous fitness. Even in the long-time limit, the fitness estimated via quality indicators available at the time of grant still accounts for a sizeable reduction by about 20-30% in the exponent governing preferential attachment in the uncontrolled-for-fitness average citation rate. More explicitly, the average fitness for patents having citations at time after grant is given by and constitutes a direct measure for the fitness-explained fraction of the average citation rate . Our observations imply
[TABLE]
in the long-time limit. Thus for patents having a citation count exceeding the average in the long-time limit. The fact that implies that the most highly cited patents will be associated with, and therefore detectable by, high values for the fitness variable. However, although it turns out to be useful and predictive, as determined by our procedure is still only a conservative and noisy estimate of the true intrinsic fitness for patents to be cited. Identifying yet other, or better, raw patent-quality indicators to include in the construction of fitness could be a way to improve it further.
As more highly cited patents typically have greater fitness [cf. Eq. (7)], a weak trend of positive correlation between obsolescence time and fitness can be deduced from Fig. 2(c). This suggest that the commonly adopted approach Bianconi and Barabási (2001); Golosovsky (2018); Pham et al. (2016) where fitness enters the citation rate (1) as a prefactor likely constitutes only a first approximation to a more complex interplay between fitness and ageing. While it may be expected that better-quality patents age more slowly, the level of noise in our data prevents firm conclusions to be drawn in the present case.
In summary, we present an empirical study of the relation between basic attributes of patents and their citation dynamics. Using only information available at the time patents are granted, we are able to estimate their intrinsic fitness for being cited. Even over time periods as long as 10 years afterwards, this fitness is found to determine significantly how many more citations are accrued, especially for the most successful, i.e., highly cited, inventions. Future research could extend our construction of the fitness parameter governing citation dynamics to also include attributes of the citing patents Ferretti et al. (2012) and account for social proximity between inventor communities Sorenson et al. (2006).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1D’Souza et al. (2007) R. M. D’Souza, C. Borgs, J. T. Chayes, N. Berger, and R. D. Kleinberg, Proc. Natl. Acad. Sci. U.S.A. 104 , 6112 (2007) . · doi ↗
- 2Kong et al. (2008) J. S. Kong, N. Sarshar, and V. P. Roychowdhury, Proc. Natl. Acad. Sci. U.S.A. 105 , 13724 (2008) . · doi ↗
- 3Papadopoulos et al. (2012) F. Papadopoulos, M. Kitsak, M. Á. Serrano, M. Boguñá, and D. Krioukov, Nature 489 , 537 (2012) . · doi ↗
- 4Wang et al. (2013) D. Wang, C. Song, and A.-L. Barabási, Science 342 , 127 (2013) . · doi ↗
- 5van de Rijt et al. (2014) A. van de Rijt, S. M. Kang, M. Restivo, and A. Patil, Proc. Natl. Acad. Sci. U.S.A. 111 , 6934 (2014) . · doi ↗
- 6Sinatra et al. (2016) R. Sinatra, D. Wang, P. Deville, C. Song, and A.-L. Barabási, Science 354 , aaf 5239 (2016) . · doi ↗
- 7Zeng et al. (2017) A. Zeng, Z. Shen, J. Zhou, J. Wu, Y. Fan, Y. Wang, and H. E. Stanley, Phys. Repts. 714-715 , 1 (2017) . · doi ↗
- 8Fortunato et al. (2018) S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, A. M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L. Waltman, D. Wang, and A.-L. Barabási, Science 359 , eaao 0185 (2018) . · doi ↗
