Temporal Bibliographic Networks
Vladimir Batagelj, Daria Maltseva

TL;DR
This paper introduces methods to transform bibliographic networks into temporal networks using temporal quantities, enabling analysis of their evolution and deriving new insights through network operations.
Contribution
It presents novel approaches for representing and analyzing bibliographic networks over time using temporal quantities and network operations.
Findings
New methods for temporal transformation of bibliographic networks
Insights into network evolution through temporal properties
Derived networks reveal different perspectives on bibliographic data
Abstract
We present two ways (instantaneous and cumulative) to transform bibliographic networks, using the works' publication year, into corresponding temporal networks based on temporal quantities. We also show how to use the addition of temporal quantities to define interesting temporal properties of nodes, links and their groups thus providing an insight into evolution of bibliographic networks. Using the multiplication of temporal networks we obtain different derived temporal networks providing us with new views on studied networks. The proposed approach is illustrated with examples from the collection of bibliographic networks on peer review.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Temporal Bibliographic Networks
Vladimir Batagelj1,2,3 Daria Maltseva1
ORCID: 0000-0002-0240-9446, 0000-0003-1789-1711
1 National Research University Higher School of Economics,
Myasnitskaya, 20, 101000 Moscow, Russia.
2Institute of Mathematics, Physics and Mechanics,
Jadranska 19, 1000 Ljubljana, Slovenia
3University of Primorska, Andrej Marušič Institute, 6000 Koper, Slovenia
Abstract
We present two ways (instantaneous and cumulative) to transform bibliographic networks, using the works’ publication year, into corresponding temporal networks based on temporal quantities. We also show how to use the addition of temporal quantities to define interesting temporal properties of nodes, links and their groups thus providing an insight into evolution of bibliographic networks. Using the multiplication of temporal networks we obtain different derived temporal networks providing us with new views on studied networks. The proposed approch is illustrated with examples from the collection of bibliographic networks on peer review.
Keywords: social network analysis, temporal networks, linked networks, bibliographic networks, temporal quantities, semiring, network multiplication, fractional approach.
MSC: 01A90, 91D30, 90B10, 16Y60, 65F30
JEL: C55, D85
1 Introduction
From data collected from bibliographic databases (WoS, Scopus, Google scholar, Bibtex, etc.) we can construct different bibliographic networks. For example using the program WoS2Pajek we obtain from data collected from WoS the following two-mode networks: the authorship network on works authors, the journalship network on works journals, the keywordship network on works keywords, and the (one-mode) citation network on works. We obtain also the following node properties: the partition of works by publication year, the partition distinguishing between works with complete description () and the cited only works (), and the vector of number of pages . Analyzing these networks we can get distributions of frequencies of different units (authors, journals, keywords) describing overall properties of networks. We can also identify the most important units (Cerinšek and Batagelj, 2015). An important tool in the analysis of linked (collections of) networks is the network multiplication that produces derived networks linking not directly linked sets of units – for example, the network links authors to keywords (Batagelj and Cerinšek, 2013).
A more detailed insight in the evoultion of bibliographic networks is enabled by considering also the temporal information. In the paper Batagelj and Praprotnik (2016) a longitudinal approach to analysis of temporal networks based on temporal quantities was presented. It is an alternative to the traditional cross-sectional approach (Holme, 2015). In this paper we show how to apply the proposed approach to temporal bibliographic networks. It can be used also in other similar contexts.
First we describe two ways how the year of publication can be combined with traditional bibliographic networks to get their temporal versions – the instantaneous and the cumulative. Afterward we present different ways to analyze these networks and networks derived from them using network multiplication.
The proposed approch is illustrated with examples on networks from the collection of bibliographic networks on peer review (Batagelj et al., 2017) on works with complete descriptions. The sizes of different sets of units are as follows: , , , and .
2 Temporal networks
A temporal network is obtained by attaching the time, , to an ordinary network where is a set of time points, .
In a temporal network, nodes and links are not necessarily present or active in all time points. Let , , be the activity set of time points for node and , , the activity set of time points for link .
Besides the presence/absence of nodes and links also their properties can change through time.
2.1 Temporal quantities
We introduce a notion of a temporal quantity
[TABLE]
where is the activity time set of , is the value of in an instant , and denotes the value undefined.
We assume that the values of temporal quantities belong to a set which is a semiring for binary operations and . The semiring where is addition and is multiplication of numbers is called a combinatorial semiring. For solving the shortest path problems on networks the semiring is used (Baras and Theodorakopoulos, 2010).
We can extend both operations to the set A_{\scriptsize\raisebox{-0.17496pt}{\includegraphics[height=4.89998pt]{command.pdf}}}=A\cup\{\raisebox{-0.24994pt}{\includegraphics[height=6.99997pt]{command.pdf}}\} by requiring that for all a\in A_{\scriptsize\raisebox{-0.17496pt}{\includegraphics[height=4.89998pt]{command.pdf}}} it holds
[TABLE]
The structure (A_{\scriptsize\raisebox{-0.17496pt}{\includegraphics[height=4.89998pt]{command.pdf}}},+,\cdot,\raisebox{-0.24994pt}{\includegraphics[height=6.99997pt]{command.pdf}},1) is also a semiring.
Let A_{\scriptsize\raisebox{-0.17496pt}{\includegraphics[height=4.89998pt]{command.pdf}}}(\mathcal{T}) denote the set of all temporal quantities over A_{\scriptsize\raisebox{-0.17496pt}{\includegraphics[height=4.89998pt]{command.pdf}}} in time . To extend the operations to networks and their matrices we first define the sum (parallel links) as
[TABLE]
The product (sequential links) is defined as
[TABLE]
Let us define temporal quantities and with requirements \mathbf{0}(t)=\raisebox{-0.24994pt}{\includegraphics[height=6.99997pt]{command.pdf}} and for all . Again, the structure (A_{\scriptsize\raisebox{-0.17496pt}{\includegraphics[height=4.89998pt]{command.pdf}}}(\mathcal{T}),+,\cdot,\mathbf{0},\mathbf{1}) is a semiring.
To produce a software support for computation with temporal quantities we limit it to temporal quantities that can be described as a sequence of disjoint time intervals with a constant value
[TABLE]
where is the starting time and the finishing time of the -th time interval , and , and is the value of on this interval. Outside the intervals the value of temporal quantity is undedined, . Therefore
[TABLE]
To illustrate both operations let us consider temporal quantities and (Batagelj and Praprotnik, 2016):
[TABLE]
The following are the sum and the product of temporal quantities and over combinatorial semiring.
[TABLE]
They are visually displayed in Figure 1.
To support computations with temporal quantities and analysis of temporal networks based on them the Python libraries TQ and Nets were developed (Batagelj, 2017). They were used in analyses presented in this paper. In the examples we used a collection of bibliographic networks on peer review from Batagelj et al. (2017).
2.2 Temporal affiliation networks
Let the binary affiliation matrix describe a two-mode network on the set of events and the set of of participants :
[TABLE]
The function assigns to each event the date when it happened. Assume . Using these data we can construct two temporal affiliation matrices:
- •
instantaneous , where
[TABLE]
- •
cumulative , where
[TABLE]
In general a temporal quantity is called cumulative iff it has for the property
[TABLE]
A sum and product (over combinatorial semiring) of cumulative temporal quantities are cumulative temporal quantities.
For a temporal quantity its cumulative is defined as
[TABLE]
where and .
A temporal network is cumulative for a weight iff all its values are cumulative.
The Python code for creating temporal networks from Pajek files for the peer review data is given in Appendix A.1.
2.3 Temporal properties
Let be a temporal network on . On it we can define some interesting temporal quantiries such as in-sum:
[TABLE]
and out-sum:
[TABLE]
In a special case where we get the productivity of an author
[TABLE]
and for we get the cumulative productivity of an author
[TABLE]
It holds .
The productivity of an author can be extended to the productivity of a group of authors
[TABLE]
There is a problem with the productivity of a group. In the case when two authors from a group co-authored the same paper it is counted twice. To account for a “real” contribution of each author the fractional approach is used. It is based on normalized networks (matrices) – in the case of co-authorship on
[TABLE]
This leads to the fractional productivity of an author a
[TABLE]
2.3.1 Example: Temporal properties in networks on peer review
In the analysis of the ordinary authorship network we get that Lutz Bornmann is the author who wrote the largest number, 61, of works on peer review (Batagelj et al., 2017).To see the dynamics of his publishing we compute his productivity
[TABLE]
see the top of Figure 2. The corresponding cumulative productivity is
[TABLE]
see the mid of Figure 2. Note that : , , …
The fractional productivity of Lutz Bornmann is
[TABLE]
see the bottom of Figure 2. For the Python code see Appendix A.2.
In the citation network for the peer review bibliography the most cited, 164, paper is Peters, D. P., Ceci, S. J. (1982). Peer-review practices of psychological journals: The fate of published articles, submitted again. Behavioral and Brain Sciences, 5(2), 187-255. The temporal quantity describes the number of citations to this paper through years.
[TABLE]
See the top of Figure 3.
Another well known paper is Hirsch, J.E. (2005). An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A. 2005 Nov 15;102(46):16569-72 with 119 citations and
[TABLE]
See the bottom of Figure 3. For the Python code see Appendix A.3.
Similarly we could look at the number of works by year , the popularity of a keyword : , etc.
2.3.2 Example: main journals publishing on peer review
To identify the main journals publishing on peer review, see Appendix A.4, we determined first the temporal inSums in the network for all journals. An entry contains the temporal quantity counting the number of papers on peer review published in the journal in each year. Because most of frequencies are small (one digit numbers) we decided to change the time scale (granularity) to time intervals: 1: 1900-1970, 2: 1971-1980, 3: 1981-1990, 4: 1991-2000, 5: 2001-2005, 6: 2006-2010, 7:2011-2015. The recoded table is labeled . For the table we determined for each time interval three the most frequently used journals – they are listed on the right side of Figure 4. The corresponding data were exported as journals.csv and visualized using R. The picture on the left side presents the trajectories of relative importance (journal’s frequency divided with the maximum frequency on the interval) for the selected journals.
The papers on peer review (refereeing) published till 1970 appeared most often in J ASSOC OFF AGR CHEM. Till 2005 the dominant journals were JAMA, SCIENCE, NATURE, BRIT MED J, and LANCET (general medical and science journals). In the period 2006-2010 the leading role was overtaken by a specialized journal SCIENTOMETRICS. In the last period 2011-2015 the primate is shifted to the mega-journals BMJ OPEN and PLOS ONE (Wakeling et al., 2016). Note that the frequencies for SCIENTOMETRICS are 3: 6, 4: 25, 5:18, 6: 44, 7: 78 and in the period 2011-2015 489 papers on peer review were published in BMJ OPEN.
3 Network multiplication and derived networks
Let on and on be (matrices of linked two-mode) networks. Their product network is determined by a matrix on of the product of corresponding matrices
[TABLE]
where
[TABLE]
For details see Batagelj and Cerinšek (2013).
Network multiplication is very important in network analysis of collections of linked networks because it enables us to construct different derived networks. For example, in analysis of bibliographic networks the network
[TABLE]
links authors to keywords: the weight of the arc from the node to the node is equal to the number of works in which the author used the keyword .
The coauthorship network is obtained as
[TABLE]
The weight is equal to total number of works authors and wrote together.
The network of normalized citations between authors
[TABLE]
The weight is equal to the fractional contribution of citations from works coauthored by to works coauthored by . Etc.
The network (matrix) multiplication can be straightforwardly extended to temporal networks.
3.1 Multiplication of temporal networks
Let on and on be (matrices of) co-occurence networks. Then is a temporal network on . What is its meaning? Consider the value of its item in an instant
[TABLE]
For to be defined (different from ) there should be at least one such that and are both defined, i.e. . Then there exists such that , , and . Similarly . Therefore
[TABLE]
For binary instantaneous two-mode networks and the value of the product is equal to the number of different members of with which both and have contact in the instant .
The product of cumulative networks is cumulative itself. For binary cumulative two-mode networks and the value of the product is equal to the number of different members of with which both and had contact in instants up to including the instant .
3.1.1 Temporal co-occurrence networks
Using the multiplication of temporal affiliation networks over the combinatorial semiring we get the corresponding instantaneous and cumulative co-occurrence networks
[TABLE]
The triple in a temporal quantity tells that in the time interval there were events in which both and took part.
The triple in a temporal quantity tells that in the time interval there were in total accumulated events in which both and took part.
The diagonal (loop) weights and contain the temporal quantities counting the number of events in the time intervals in which the participant took part.
A typical example of such a network is the works authorship network where is the set of papers , is the set of authors , and is the publication year.
3.1.2 Example: Temporal coauthorship network
The instantaneous coauthorship network is obtained as
[TABLE]
Bibliographic networks are usually sparse. Often also the product of sparse networks is sparse itself. Considering in computation only non zero elements it can be computed fast (Batagelj and Cerinšek, 2013). In our example, the network has 22104 works, 62106 authors and 80021 arcs. The derived network has edges and was computed on a laptop in 12.7 seconds.
For the peer review data we get the largest values
,
,
.
The corresponding temporal quantities and are
[TABLE]
Both temporal quantities are presented in Figure 5. The Python code is given in Appendix A.4.
3.1.3 Example: Temporal citations between journals
The derived network describing citations between journals is obtained as
[TABLE]
Note that the third network in the product is cumulative.
The weight of the element is equal to the number of citations per year from works published in journal to works published in journal . In a special case when we get a temporal quantity describing selfcitations of journal . In the peer review data the largest number of selfcitations are 320 in JAMA and 148 in Scientometrics. The corresponding temporal quantities and are:
[TABLE]
and are presented in Figure 6.
The largest number of citations are from journals BMJ Open (142) and Scientometrics (108) to the unknown journal *****, followed by and with totals 96 and 91.
[TABLE]
See the top and mid part of Figure 7.
In the peer review data the journal JAMA is the most prominent. To get the temporal quantity describing citations of others to JAMA we compute :
[TABLE]
It is presented at the bottom of Figure 7. The Python code is given in Appendix A.5.
Similarly we get the temporal network describing citations between authors
[TABLE]
The weight of the element is equal to the number of citations per year from works coauthored by author to works coauthored by author .
4 Conclusions
We presented two ways (instantaneous and cumulative) to transform bibliographic networks, using the works’ publication year, into corresponding temporal networks based on temporal quantities. They are a basis for a longitudinal approach to the analysis of temporal network which is an alternative to the traditional cross-sectional approach. Introducing a time dimension can give additional insights into bibliographic networks. We also presented some methods for analyzing the obtained temporal networks and illustrated them with examples from analysis of the peer review bibliography.
We presented only some examples to show that it works. The proposed approach can be extended in some directions:
- •
other node and link properties;
- •
other derived networks combined with fractional approach;
- •
normalization (proportions) of temporal properties considering the changes of the “size” of network through time;
- •
clustering of temporal quantities to determine their types;
- •
temporal networks methods produce large results. Special methods for identifying and presenting (visualizing) interesting parts need to be developed.
Acknowledgments
The paper is based on presentations on 1244. Sredin seminar, IMFM, Ljubljana, April 8, 2015; PEERE meeting, Vilnius, March 7-9, 2017; and XXXVII Sunbelt workshop Beijing, China, May 30 – June 4, 2017.
This work is supported in part by the Slovenian Research Agency (research program P1-0294 and research projects J1-9187, J7-8279 and BI-US/17-18-045), project PEERE (COST Action TD1306) and by Russian Academic Excellence Project ’5-100’.
Appendix A Code in Nets
A.1 Converting Pajek net and clu files into temporal network in netsJSON
To set up an environment for computing our examples we have to put in the directory gdir Python files (Nets.py, TQ.py, search.py, coloring.py, IndexMinPQ.py) from the library Nets, and in the subdirectory cdir the files TQchart.html, d3.v3.min.js and barData.js. The directory ndir contains the network data and the directory wdir contains the results.
gdir = ’c:/path/Nets’ wdir = ’c:/path/Test/peere’ ndir = ’c:/path/WoS/peere2’ cdir = ’c:/path/Nets/chart’ import sys, os, datetime, json sys.path = [gdir]+sys.path; os.chdir(wdir) from TQ import * from Nets import Network as N net = ndir+"/WAd.net" clu = ndir+"/Yeard.clu" t1 = datetime.datetime.now(); print("started: ",t1.ctime(),"\n") WAc = N.twoMode2netsJSON(clu,net,’WAcum.json’,instant=False) t2 = datetime.datetime.now() print("\nconverted to cumulative TN: ",t2.ctime(),"\ntime used: ", t2-t1) WAi = N.twoMode2netsJSON(clu,net,’WAins.json’,instant=True) t3 = datetime.datetime.now() print("\nconverted to instantaneous TN: ",t3.ctime(),"\ntime used: ", t3-t2) cit = ndir+"/CiteD.net" Citei = N.oneMode2netsJSON(clu,cit,’CiteIns.json’,instant=True) t4 = datetime.datetime.now() print("\nconverted to instantaneous TN: ",t4.ctime(),"\ntime used: ", t4-t3) ia = WAi.Index() ic = Citei.Index()
A.2 Productivities of authors
tit = ’BORNMANN_L’; b = ia[tit] pr = WAi.TQnetInSum(b) pr [(2005, 2006, 4), (2006, 2007, 3), (2007, 2008, 4), (2008, 2009, 9),... TQ.TqSummary(pr) (1900, 2017, 0, 14) TQmax = 15; Tmin = 1995; Tmax = 2016; w = 600; h = 150 N.TQshow(pr,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’) cpr = WAc.TQnetInSum(b) cpr [(2005, 2006, 4), (2006, 2007, 7), (2007, 2008, 11), (2008, 2009, 20),... TQmax = 65; Tmin = 1995; Tmax = 2016; w = 600; h = 250 N.TQshow(cpr,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’) WAni = WAi.TQnormal() fpr = WAni.TQnetInSum(b) fpr [(2006, 2007, 1.3333333333333333), (2007, 2008, 1.6666666666666665),... TQmax = 7; Tmin = 1995; Tmax = 2016; w = 600; h = 150 N.TQshow(fpr,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’)
A.3 Citations between works
tit = ’PETERS_D(1982)5:187’; c = ic[tit] ci = Citei.TQnetInSum(c) ci [(1982, 1983, 1), (1983, 1984, 4), (1984, 1986, 3), (1986, 1987, 2), ... TQmax = 15; Tmin = 1980; Tmax = 2016; w = 600; h = 150 N.TQshow(ci,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’blue’) tit = ’HIRSCH_J(2005)102:16569’; c = ic[tit] ci = Citei.TQnetInSum(c) ci [(2005, 2006, 0), (2006, 2007, 3), (2007, 2008, 4), (2008, 2009, 7), ... TQmax = 25; Tmin = 2000; Tmax = 2017; w = 600; h = 250 N.TQshow(ci,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’blue’)
A.4 The most important journals
jrn = ndir+"/WJd.net" WJc = N.twoMode2netJSON(clu,jrn,’WJcum.json’,instant=False) WJi = N.twoMode2netJSON(clu,jrn,’WJins.json’,instant=True) J = list(WJi.nodesMode(2)) Jt = [ (j, WJi._nodes[j][3][’lab’], TQ.cutGT(WJi.TQnetInSum(j),0)) for j in J ] p = [0,1971,1981,1991,2001,2006,2011,2016,3000] Jr = [ (j,l,TQ.changeTime(a,p)) for (j,l,a) in Jt ] I = { Jt[j][1] : j for j in range(len(Jt)) } JL = [ "BEHAV BRAIN SCI", "BMJ OPEN", "BRIT MED J", "CUTIS", "J ASSOC OFF AGR CHEM", "JAMA-J AM MED ASSOC", "J SEX MED", "LANCET", "MED J AUSTRALIA", "NATURE", "NEW ENGL J MED", "PLOS ONE", "SCIENCE", "SCIENTOMETRICS" ] IJ = [ I[j] for j in JL ]; Ir = [ Jr[i] for i in IJ]
In the library TQ we included a new function changeTime that recodes a temporal quantity into new time intervals determined by a sequence .
A.5 Temporal coauthorship network
Co = WAi.TQtwo2oneCols() Co.saveNetsJSON(’CoIns.json’,indent=2) Co.delLoops() C = Co.TQtopLinks(thresh=15) tit = C[0][2]+’ - ’+C[0][3]; bd = C[0][5] TQmax = 15; Tmin = 2000; Tmax = 2017; w = 600; h = 150 N.TQshow(bd,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’) tit = C[2][2]+’ - ’+C[2][3]; ra = C[2][5] TQmax = 10; Tmin = 1996; Tmax = 2017; w = 600; h = 150 N.TQshow(ra,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’) TQ.total(bd), TQ.total(ra) (42, 17)
A.6 Citations between journals
JCJ = N.TQmultiply(N.TQmultiply(WJi.transpose(),Citei.one2twoMode()),WJc,True) L = JCJ.TQtopLoops(thresh=100) tit = L[0][1]; jm = L[0][3] TQmax = 70; Tmin = 1970; Tmax = 2017; w = 600; h = 200 N.TQshow(jm,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’blue’) tit = L[1][1]; sm = L[1][3] TQmax = 35; Tmin = 1990; Tmax = 2017; w = 600; h = 200 N.TQshow(sm,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’blue’) JCJ.delLoops() T = JCJ.TQtopLinks(thresh=70) tit = T[2][2]+’ - ’+T[2][3]; bj = T[2][5] TQmax = 25; Tmin = 1990; Tmax = 2017; w = 600; h = 200 N.TQshow(bj,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’) tit = T[3][2]+’ - ’+T[3][3]; pj = T[3][5] TQmax = 25; Tmin = 2005; Tmax = 2017; w = 600; h = 200 N.TQshow(pj,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’) jci = TQ.cutGE(JCJ.TQnetInSum(T[2][1]),1e-10) TQ.TqSummary(jci) (1979, 2017, 1, 276) TQ.total(jci) 3861 tit = ’others - ’+T[2][3] TQmax = 280; Tmin = 1975; Tmax = 2017; w = 600; h = 200 N.TQshow(jci,cdir,TQmax,Tmin,Tmax,w,h,tit,fill=’red’)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Baras and Theodorakopoulos (2010) Baras, J.S., Theodorakopoulos, G. (2010) Path problems in networks. Morgan & Claypool, Berkeley.
- 2Batagelj (2007) Batagelj, V. (2007) Wo S 2Pajek. Networks from Web of Science. Version 1.5 (2017). http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:wos 2pajek
- 3Batagelj and Cerinšek (2013) Batagelj V., Cerinšek M.(2013). On bibliographic networks. Scientometrics. 96 (3), 845-864
- 4Batagelj et al. (2014) Batagelj, V., Doreian P., V., Ferligoj, A., Kejžar N. (2014). Understanding Large Temporal Networks and Spatial Networks: Exploration, Pattern Searching, Visualization and Network Evolution. Wiley.
- 5Batagelj et al. (2017) Batagelj, V., Ferligoj, A. Squazzoni, F. (2017) The emergence of a field: a network analysis of research on peer review. Scientometrics, 113: 503. https://doi.org/10.1007/s 11192-017-2522-8
- 6Batagelj and Praprotnik (2016) Batagelj, V., Praprotnik, S. (2016) An algebraic approach to temporal network analysis based on temporal quantities. Social Network Analysis and Mining , 6(1), 1-22
- 7Batagelj (2017) Batagelj, V. (2017) Nets – a Python package for network analysis. https://github.com/bavla/Nets
- 8Cerinšek and Batagelj (2015) Cerinšek, M., Batagelj, V.: Network analysis of Zentralblatt MATH data. Scientometrics, 102(2015)1, 977-1001.
