TL;DR
This paper introduces SemNet, a semantic network built from 750,000 scientific papers, which predicts future research trends in quantum physics and inspires novel scientific ideas using neural networks.
Contribution
The paper presents a novel method to construct and utilize a semantic network from scientific literature for trend prediction and idea generation in quantum physics.
Findings
SemNet accurately predicts future research trends in quantum physics.
Deep neural networks trained on SemNet states can suggest innovative research ideas.
SemNet captures influential and prize-winning research topics from history.
Abstract
The vast and growing number of publications in all disciplines of science cannot be comprehended by a single human researcher. As a consequence, researchers have to specialize in narrow sub-disciplines, which makes it challenging to uncover scientific connections beyond the own field of research. Thus access to structured knowledge from a large corpus of publications could help pushing the frontiers of science. Here we demonstrate a method to build a semantic network from published scientific literature, which we call SemNet. We use SemNet to predict future trends in research and to inspire new, personalized and surprising seeds of ideas in science. We apply it in the discipline of quantum physics, which has seen an unprecedented growth of activity in recent years. In SemNet, scientific knowledge is represented as an evolving network using the content of 750,000 scientific papers…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Predicting Research Trends with Semantic and Neural Networks
with an application in Quantum Physics
Mario Krenn
Vienna Center for Quantum Science & Technology (VCQ), Faculty of Physics, University of Vienna, Austria.
Institute for Quantum Optics and Quantum Information (IQOQI), Austrian Academy of Sciences, Vienna, Austria.
Department of Chemistry & Computer Science, University of Toronto, Canada.
Vector Institute for Artificial Intelligence, Toronto, Canada.
Anton Zeilinger
Vienna Center for Quantum Science & Technology (VCQ), Faculty of Physics, University of Vienna, Austria.
Institute for Quantum Optics and Quantum Information (IQOQI), Austrian Academy of Sciences, Vienna, Austria.
(March 17, 2024)
Abstract
The vast and growing number of publications in all disciplines of science cannot be comprehended by a single human researcher. As a consequence, researchers have to specialize in narrow sub-disciplines, which makes it challenging to uncover scientific connections beyond the own field of research. Thus access to structured knowledge from a large corpus of publications could help pushing the frontiers of science. Here we demonstrate a method to build a semantic network from published scientific literature, which we call SEMNET. We use SEMNET to predict future trends in research and to inspire new, personalized and surprising seeds of ideas in science. We apply it in the discipline of quantum physics, which has seen an unprecedented growth of activity in recent years. In SEMNET, scientific knowledge is represented as an evolving network using the content of 750,000 scientific papers published since 1919. The nodes of the network correspond to physical concepts, and links between two nodes are drawn when two physical concepts are concurrently studied in research articles. We identify influential and prize-winning research topics from the past inside SEMNET thus confirm that it stores useful semantic knowledge. We train a deep neural network using states of SEMNET of the past, to predict future developments in quantum physics research, and confirm high quality predictions using historic data. With the neural network and theoretical network tools we are able to suggest new, personalized, out-of-the-box ideas, by identifying pairs of concepts which have unique and extremal semantic network properties. Finally, we consider possible future developments and implications of our findings.
I Introduction
A computer algorithm with access to a large corpus of published scientific research could potentially make genuinely new contributions to science. With such a body of knowledge, the algorithm could derive new scientific insights that are unknown to human researchers and note contradictions within existing scientific knowledge evans2011advancing ; you2015darpa . This level of automation of science is more in the realm of science-fiction than reality at present. However, algorithms with access to and the capability of extracting semantic knowledge from the scientific literature can be employed in manifold ways to assist scientists and thereby augment scientific progress. As an example, the evaluation of whether an idea is novel or surprising depends crucially on already-existing knowledge. Thus a computer algorithm with the capability to propose new, useful ideas or potential avenues of research will necessarily require access to published scientific literature - which forms at least partially the body of human knowledge in a scientific field.
Knowledge can be portrayed using semantic networks that represent semantic relations between concepts in a network lehmann1992semantic . Over the last few years, significant results have been obtained by automatically analyzing the large corpus of scientific literature evans2011metaknowledge ; zeng2017science ; fortunato2018science , including the development of semantic networks in several scientific disciplines.
In biochemistry, a semantic network has been built using a well-defined list of molecule names (which correspond to the nodes of the network) and forming edges when two components co-appeare in the abstract of a scientific paper. The network was derived from millions of papers published over 30 years, and the authors identify a more efficient, collective strategy to explore the knowledge network of biochemistry foster2015tradition ; rzhetsky2015choosing . In iacopini2018network , a semantic network was created using 100.000 papers from astronomy, ecology, economy and mathematics. The nodes represent ideas or concepts (generated through automated generation of key-concepts in large bodies of texts milojevic2015quantifying ). The authors used the network to draw connections between human innovation process and random walks. In the field of neuroscience, semantic networks have been used to map the landscape of the field beam2014mapping ; dworkin2018landscape . Papers from the interdisciplinary journal PNAS have been used to investigate sociological properties such as inter-disciplinary research dworkin2019emergent .
Here, we show how to build and use a semantic network for quantum physics, which we call SEMNET. It is built from 750.000 scientific papers in physics published since 1919. In the network we identify a number of historic award-winning concepts, indicating that SEMNET carries useful semantic knowledge. The evolution of such a large network allows us to use an artificial neural network for predicting research concepts that scientists will investigate in the next five years. Finally, we demonstrate the power of SEMNET to suggest personalized, novel and unique directions for future research 111Code and details: https://github.com/MarioKrenn6240/SEMNET.
Our work differs in several aspects from previous semantic networks created from scientific literature. First, we use machine learning to draw conclusions from earlier states to SEMNET’s future state, which enables us to make predictions about the future research trends of the discipline. Second, we use network theoretical tools and machine learning to identify pairs of concepts with exceptional network properties. Those concept combinations can be restricted to the research interest of a specific scientist. This ability allows us to not only predict but also suggest uninvestigated concept pairs which human scientifists might not have identified because they are out of the own sub-field, but which have properties that indicate an exceptional relation. They could be a seed of a new, out-of-the-box idea. Third, we apply SEMNET to quantum physics, which has seen an enormouse growth during the last decade due to the potential transformative technologies. The growth can be seen in the establishment of several high-quality journals for quantum research (such as Quantum, npj Quantum Information, IOP’s Quantum Science & Technology) and multi-billion dollar fundings from governments and strong involvement of private companies and startups worldwide. The growth rate leads to enormous increase in scientific results and publications, which are difficult to follow for individual researchers – thus quantum physics is an ideal test-bed for SEMNET.
II Semantic Network of Quantum Physics
A semantic network, or knowledge network, represents relations between concepts in the form of a network. Now we describe in more detail how the network is built, especially how the concept list is generated and how links are formed. A schematic illustration can be seen in Figure 1, more details in Figure 2.
II.1 Creation of the concept list
We generate the concept list via two independent methods. First, we use human-made lists of physical concepts. These concepts are compiled from the indices of 13 quantum physics books (which were available to us in a digital form), as well as titles of Wikipedia articles that are linked in a quantum physics category. This human-made collection contains approximately 5000 entries physical concepts.
We extend the human-generated list with an automatically generated list of physical concepts. For this, we apply a natural language processing tool called RAKE (Rapid Automatic Keyword Extraction) rose2010automatic to the titles and abstracts of approximately 100.000 articles published in quantum physics categories on the arXiv preprint server, which we chose to optimize the list for current research topics in quantum physics. RAKE is based on statistical text analysis, and can automatically find relevant keywords in texts. We combine the human- and machine-generated lists of concepts and further optimize them to delete incorrectly identified concepts (which were introduced by imperfections of the statistical analysis of RAKE) and names of people (which are not concepts), merge synonyms and normalize for the singular and plural of the same concept. Ultimately, this yields a list of 6,300 terms. As an example, five randomly chosen examples are three level system, photon antibunching, chemical shift, neutron radiation and unconditionally secure quantum bit commitment. Each of these quantum physics concepts is a node in SEMNET.
II.2 Creation of the network
To form connections between different quantum physics concepts, we use 100.000 articles of quantum physics categories on arXiv, and the dataset of all 650,000 articles ever published by the APS. We chose these two data sources because the APS database contains peer-reviewed physics papers from the last 100 years (allowing for investigation of long-term trends), while the arXiv database contains specific quantum physics papers, allowing for more precise coverage of the quantum physics research trends.
Whenever two concepts occur together in a title or an abstract of an article, we interpret that as a semantic connection between these concepts, and add a unique link between the two corresponding nodes in the network. Relations between two concepts can take many forms. Concepts may be put together for example when mathematical tool (such as Schmidt rank) is used to investigate a specific quantum system (such as vector beam or exciton polariton), or when insights from a specific technique (such as lasing without inversion or rabi oscillation) lead to conclusions about another property (such as transport property or atom transition frequency) or when fundamental ideas (such as quantum decoherence or quantum energy teleportation) are studied in the context of foundational experiments (such as delayed choice experiment or Mermin inequality). While this method clearly cannot represent all quantum physics knowledge, it represents elements of its semantic structure, which we demonstrate in what follows.
The resulting network SEMNET has 6368 vertices with more than 1.7 million edges (drawn from more than 15 million concept pairs pulled from 750.000 physics articles), using physics articles from 1919 to december 2017.
III Results
III.1 Past quantum physics trends
First, we use the evolution of the semantic network to identify impactful emerging fields of research in the past. We define emerging fields as either concepts or concept pairs which have grown significantly after they have been introduced or connected for the first time, over periods of five years.
Figure 3a shows the quantum physics topics that have grown the fastest (in terms of numbers of papers in which they have been mentioned) after their emergence, from the years 1987 to 2017. Figure 3b shows, for each year, which two-concept combinations have grown the fastest in the first five years after they have been first connected. In Figure 3, many of the emerging fields clearly correspond to important discoveries, advances in understanding and shifts of thought within quantum science research. One of the fastest growing concepts is Qubit, which emerged in 1995 (first in april in a Phys.Rev.A paper by Schumacher schumacher1995quantum , then in arXiv preprints by Chuang&Yamamoto chuang1995simple and by Knill knill1995approximation ; knill1995bounds ). Qubits are the basic units of quantum information – generalizing classical bits to coherent quantum superpositions, and connect quantum mechanics and information science. The emergence of the qubit can be interpreted as the start of the discipline of quantum information science. Enormous growth is seen for topics connected to graphene, starting in 2005, the discoverers of which were awarded the 2010 Nobel Prize in Physics. Interesting, graphene itself was mentioned (in our data collection) already back in the early 1990s in Phys.Rev.B papers bayot1990two ; di1991magnetic ; moreh1991effective , when it was not a strongly emergent concept itself. Strong growth in research into topological materials can be observed from approximately 2008; the Nobel Prize in Physics was subsequently awarded in this area in 2016. Aaronson’s and Arkhipov’s approach to achieving quantum supremacy harrow2017quantum using linear photonic networks, termed BosonSampling aaronson2011computational , achieved considerable attention (with more than 600 citations since its introduction in 2011, and considerable experimental efforts into this directions). Since 2012, the application of machine learning to quantum physics has become a prominent and diverse topic of research, that falls under the umbrella of quantum machine learning (recently summarized in two prominent reviews biamonte2017quantum ; dunjko2018machine , and also observable by the foundation of a novel high-quality journal for this topic, Springer Quantum Machine Intelligence). These findings confirm that SEMNET contains useful semantic information.
III.2 Predictive ability of the SEMNET
Having used SEMNET to study past quantum trends, we investigate its ability to provide projections of knowledge developments in the future. This essential question in network science is called link-prediction problem, and asks which new link will be formed between unconnected vertices of the network in the future given the current state of the network (for a detailed investigation of the link-prediction problem in network theory, see liben2007link ). We apply this problem in the context of semantic networks which are generated from published scientific literature. In the present case looking at the field of quantum physics, we ask which two concepts that have not yet been studied together might be investigated together in a scientific article over the next five years. To answer this question, we use an artificial neural network, with four fully connected layers (two hidden layers). The structure of the neural network and its training is shown in Figure 4. Its task is to rank all unconnected pairs of concepts (roughly 5% of all edges have been drawn by the end of 2017), starting with the pair that is most likely to be connected five years, up to the pair that most likely stays unconnected. Ultimately we want to apply the neural network to the current SEMNET and predict the future trends. To validate its quality, we first input to the neural network past states of SEMNET (for example, containing data only up to 2002), and train it to predict new links by 2007. After the training, we apply this network on 2007 data and validate its quality for data of the year 2012 (which it has never seen before).
The semantic network is very large (consisting of 63686368 entries for each year, which are the number of possible connections between the 6368 quantum physics concepts, compared to 2828 pixels for the famous MNIST dataset of handwritten images, and 256256 pixels for ImageNet lecun2015deep ), and involves combinatorial, graph-based information which are more structured than images (see for example wu2019comprehensive ). For that reason, it is an unsuitable direct input to the neural network. Instead, we compute semantic network properties for each pair of concepts. For each pair of concepts that are unconnected in SEMNET, we calculate 17 network properties where . Here, and are the degrees of concept and , and and are the numbers of papers in which they are mentioned. While these four properties are purely local, is the cosine-similarity between the two concepts, which corresponds to the number of common neighbors. A cosine similarity of one indicates that the terms might be synonyms. The next nine properties indicate the number of paths with lengths of two, three and four between the physics concepts in the current and previous two years. These properties allow us to draw conclusions from the evolution over time of various topics as tracked by SEMNET. The choice to use large path lengths as one of the properties is strengthened by a very recent observation that the paths of length 3 (L3) are crucial for link prediction tasks in a network for protein interactions kovacs2019network . Finally, the last three properties correspond to three different measures of distance between the two concepts. More details can be seen in the SI.
We explain these properties on a concrete pair of concepts, interaction-free measurement and Leggett-Garg inequality. (We chose the example randomly, from unconnected concepts that had been mentioned individually more than 30 times.) The concept represents ”interaction-free measurement which is mentioned in 60 abstracts and has 135 connections to other concepts by 2012. The concept represents the ”Leggett-Garg inequality”, which occurs in 33 abstracts and has 141 connections to other concepts by the end of 2012. These two concepts were not connected in SEMNET as of 2012, therefore, the 15th property, their network distance, is (neighbors have a distance of one, in other words, there is a direct path connecting them of length one). In 2012, the two concepts have a cosine-similarity , meaning that 22.8% of their neighbors are shared. Two years later, in 2014 an article on arXiv mentioned both of these concepts in the abstract and the work was later published robens2015ideal and featured knee2015quantum in the high-impact journal Physical Review X, achieving approximately 100 citations within four years. This example indicates that drawing first connections between concepts can lead to significant scientific insights.
The 17 properties for each unconnected concept pair in SEMNET are used by the neural network to estimate which pairs of quantum physics concepts are likely to be connected within 5 years and which are not.
To quantify the quality of the predictions, we employ a commonly-used technique called the receiver operating characteristic (ROC) curve fawcett2004roc . For this, the neural network is used to classify unconnected nodes into two sets: one set that is connected after five years, and a set that is non-connected. Figure 4 shows a significant ability to predict connections between pairs of topics – even through we restrict ourselves to pairs that share less than 20% of their neighbors (to prevent predictions of concepts which have similar meaning). This indicates that even research that draws new connections between concepts, can be predicted with high quality.
IV Proposing future research topics
Next, we attempt to use SEMNET and the artificial neural network to suggest new, potentially fruitful research directions in quantum physics. While it is interesting and useful to understand future trends, it potentially cannot by itself lead to surprising or out-of-the-box ideas (otherwise they would not be predictable). Therefore, we extend our previous approach with network theoretic tools, to identify concept pairs with exceptional network-theoretic properties. Furthermore. Since science is conducted by (groups of) individual scientists, suggestions for proposed new research directions need to be personalized (otherwise, we would obtain suggestions for topics in which nobody is an expert in – which may be potentially interesting but limited in applicability).
How do we obtain suggestions for an individual scientist? What we find interesting and surprising strongly depends on what we already know. To gauge that, we need to investigate a given scientist’s previously- published body of research papers and extract a list of concepts (from the concept list generated before) that define that person’s personal research agenda(s). We define key concepts as concepts investigated over-proportionally often by the scientist, compared to the relative frequency of that concept in all 750.000 papers. Each concept in the papers authored by the scientist has a probability that we calculate by the the number of occurrences of the concept divided by the sum of occurrences of all concepts, which is . Each concept also has a probability of occuring in all 750.000 papers that we use, written as , where is the number of occurrences of the concept in all 750.000 articles. The ratio indicates the research agenda of the scientist. A value of shows that the scientist investigates the concept overproportionally often.
Our approach is to identify personalized suggestions of pairs of concepts that have never been connected. The concepts with value are paired with all of the other 6.368 concepts. This translates to a list of potentially 100.000s of possible topic pairs. For further usability, we introduce a way to sort the candidate suggestions. Suggestions can be sorted by identifying concept pairs with unique and unusual properties. For each pair of concepts, we have already calculated 18 different network properties: 17 properties which have been used by the neural network for generating predictions, and the prediction value itself. Together, these properties define a multi-dimensional space in which the location of each concept pair depends on its network properties.
To identify unusual and unique concept pairs, we search for outliers in this high-dimensional space. An outlier indicates a pair of concepts that is uniquely located in the space, and thus has unique properties in the semantic SEMNET network. We can visualize, for an anonymous example scientist, a 3-dimensional projection of the high-dimensional space in Fig. 6. There, every dot corresponds to a concept pair which is located according to its network properties. Outliers can be identified by the darkness of their color.
A few suggestion from SEMNET, for the example scientist: Some of the highest predicted pairs (from Top10) are orbital angular momentum & magnetic skyrmion, spin orbit coupling & quantum sensing or dicke model & cloning, filtered for highly predicted, uncommon pairs (cosine similarity 0.03; from Top10): topos theory & cyclic operation, critical exponent & reed muller code, quantum key distribution & adhm construction. Unrestricted concept lists (normalized concept degree 0.1; from Top10): atom cavity system & mode volume, entanglement of formation & multiqubit state, neutrino oscillation & dark photon. For more examples, see SI.
V Outlook
Machine Learning – Graph-based machine learning models, which have been studied in recent years, could improve prediction qualities in the link-prediction task, for example see li2015gated ; niepert2016learning ; wu2019comprehensive . Furthermore, as SEMNET represents a time evolution of quantum physics’ semantic network, applying efficient tools for handling time-dependent data, such as a long short-term memory hochreiter1997long might further significantly improve the prediction quality. Application of techniques from machine translation could be beneficial to introduce multiple classes of connections within semantic networks vaswani2017attention . Additionally, combining our approach with unsupervised embedding of scientific literature, as shown in tshitoyan2019unsupervised could lead to interesting, dynamic networks.
Network Theory and Science of Science – Currently, SEMNET represents connections between concepts that appear in the scientific literature. This is of course a vast simplification of scientific knowledge, as concepts in natural languages can have a manifold of relations helbig2006knowledge . An extension could employ more complex structures for knowledge representation, such as hyper-graphs shi2015weaving . The concept list, which represents the nodes of SEMNET, can be improved by various different, sophisticated ways for generating of lists of concepts and categories milojevic2015quantifying ; sreenivasan2013quantitative . The extension to combinations of more than pairs of concepts will lead to more complex knowledge representations. Furthermore, it would be insightful to fold into the semantic network numbers of article citations, which is, at least in the field of science, frequently used as a proxy for scientific impact (see uzzi2013atypical ; martin2013coauthorship ; kuhn2014inheritance , for example). This may enable the prediction of future research directions to be made taking into consideration the highest potential impact, potentially accelerating the evolution of individual scientific knowledge sinatra2016quantifying ; barabasi2018formula .
Surprisingness – In this work, we place pairs of concepts in an abstract high-dimensional space and identify outliers that have unique and potentially valuable properties. It would be interesting to apply more, and different measures of surprisingness. An interesting example is the information-based Bayesian surprise function, which has been introduced in the context of human attention itti2006bayesian and successfully applied to the subfield of computational creativity CompCreat ; pinel2015culinary . In order to achieve further progress, it would be important to further explore and genuinely understand what human scientists consider as surprising and creative.
VI Discussion
We show how to create a semantic network in the field of quantum physics, demonstrate its useage to predict future trends in the field and how it can be used to suggest pairs of concepts, which are not yet investigated jointly, but have distinct network properties. We show how to filter the suggestions for the research agends of an individual scientist. The approach presented here is independent of the discipline of science. As such it can be applied to other fields of research.
This can be interpreted as one potential road towards computer-inspired science, in the following sense: We imagine cases (which we believe is possible) where SEMNET produces seeds or inspirations of unusual ideas or directions of thoughts, that a researcher alone might not have thought of. The subsequent, successful interpretation and scientific execution of the suggestions fully remains the task of a creative, human scientist.
Acknowledgements
MK thanks James A. Evans and Sasha Belikov for exciting discussions of metaknowledge research and automation of science, and Jacob G. Foster for a short but influencial conversation at the International Symposium on Science of Science 2016. Furthermore, we would like to acknowledge Nora Tischler, Armin Hochrainer, Robert Fickler, Radek Lapkiewicz, Manuel Erhard and Philipp Haslinger for many interesting discussion on related topics. The authors also thank the APS (American Physical Socienty) for providing access to the database of all published articles in APS journals. The authors thank Xuemei Gu for the illustrations of Figure 1 and 4. This work was supported by the Austrian Academy of Sciences (ÖAW), University of Vienna via the project QUESS and the Austrian Science Fund (FWF) with SFB F40 (FOQUS) and the Erwin Schrödinger fellowship No. J4309.
VII Network theoretical properties used for predictions
The neural network receives 17 network theoretical properties from SEMNET, which we detail here. For a concept and , the vector corresponds to 17 real valued numbers. SEMNET of a specific year corresponds to an adjacency matrix, which we denote as .
- •
[0,1]: normalized degree centrality of first concept (normalized by largest degree centrality in the concept list), i.e. with how many other concept is connected divided by the connection numbers of the concept with most neighboring concepts.
- •
[0,1], normalized degree centrality of second concept .
- •
[0,1], number of titles and abstract that concept occures (normalized by number of concept that occures in most articles.
- •
[0,1], number of titles and abstract that concept occures (normalized by number of concept that occures in most articles.
- •
[0,1], ratio of common neighbors, also known as cosine similarity.
- •
[0,1], paths of length=2 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=2 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=2 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=3 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=3 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=3 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=4 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=4 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=4 between and normalized by pair with largest number of paths, at year .
- •
, network distance between and .
- •
[0,1], weighted network distance between and (normalized by largest value of all pairs). Intuition: The more connections between certain edges, the easier it to transition from the one to the other.
- •
[0,1], different normalized weighted network distance between and . Intuition: The more connections between certain edges, the easier it to transition from the one to the other.
VIII Future suggestions from SEMNET
Here we show a number of future suggestions with different parameter settings. These pairs of concepts are network-theoretically distinguished, and they couldd be inspirations for the creative, human scientist. The concept list used here is unrestricted, meaning not tailored for a specific scientist’s research interest.
VIII.1 General Concepts
Unrestricted; Highest predicted values:
hybrid system, classical communication
cosS: 0.30407, deg: 0.22924, pred: 1 2. 2.
back action, classical communication
cosS: 0.34642, deg: 0.23012, pred: 0.98235 3. 3.
spin orbit interaction, quantum sensing
cosS: 0.31003, deg: 0.23375, pred: 0.95525 4. 4.
conformal field theory, classical communication
cosS: 0.28176, deg: 0.23493, pred: 0.94893 5. 5.
spin orbit coupling, quantum sensing
cosS: 0.33201, deg: 0.25839, pred: 0.94077 6. 6.
light matter interaction, classical communication
cosS: 0.28623, deg: 0.24769, pred: 0.93416 7. 7.
classical mechanic, classical communication
cosS: 0.3182, deg: 0.24956, pred: 0.92603 8. 8.
universality, weyl semimetal
cosS: 0.44731, deg: 0.30365, pred: 0.90986 9. 9.
many body physic, classical communication
cosS: 0.29946, deg: 0.23414, pred: 0.9079 10. 10.
propagator, weyl semimetal
cosS: 0.44141, deg: 0.30493, pred: 0.88731
cosS0.15; Highest predicted values:
molecule, stanene
cosS: 0.14975, deg: 0.38553, pred: 0.87155 2. 2.
wave function, stanene
cosS: 0.14554, deg: 0.41675, pred: 0.85192 3. 3.
ground state, laser printing
cosS: 0.080176, deg: 0.43108, pred: 0.79129 4. 4.
laser, stanene
cosS: 0.14711, deg: 0.39918, pred: 0.73576 5. 5.
spin state, rarita schwinger equation
cosS: 0.10752, deg: 0.25182, pred: 0.73427 6. 6.
two level atom, ultracold atom gas
cosS: 0.14962, deg: 0.20833, pred: 0.71826 7. 7.
correlation, laser printing
cosS: 0.076358, deg: 0.47497, pred: 0.71787 8. 8.
optical lattice, electromagnetically induced grating
cosS: 0.12275, deg: 0.24917, pred: 0.71311 9. 9.
polarization, laser printing
cosS: 0.083372, deg: 0.42666, pred: 0.71008 10. 10.
wave function, laser printing
cosS: 0.082139, deg: 0.41086, pred: 0.70284
deg0.05; Highest predicted values:
seesaw mechanism, dark photon
cosS: 0.42051, deg: 0.046927, pred: 0.52255 2. 2.
majoron, tribimaximal mixing
cosS: 0.43699, deg: 0.026998, pred: 0.4697 3. 3.
matrix product operator, multi scale entanglement renormalization ansatz
cosS: 0.367, deg: 0.044375, pred: 0.45618 4. 4.
electron neutrino, tribimaximal mixing
cosS: 0.32507, deg: 0.047222, pred: 0.45098 5. 5.
valleytronic, spin transistor
cosS: 0.39342, deg: 0.043687, pred: 0.43787 6. 6.
fair sampling, bell test experiment
cosS: 0.38788, deg: 0.018751, pred: 0.4309 7. 7.
dark photon, little hierarchy problem
cosS: 0.4419, deg: 0.026311, pred: 0.4296 8. 8.
wiggler, smith purcell effect
cosS: 0.26696, deg: 0.042411, pred: 0.42564 9. 9.
valleytronic, spatial inversion
cosS: 0.34483, deg: 0.043982, pred: 0.41915 10. 10.
quantum key, continuous variable quantum cryptography
cosS: 0.28986, deg: 0.044375, pred: 0.41585
cosS0.15, deg0.05; Highest predicted values:
self pulsing, laser printing
cosS: 0.13666, deg: 0.028176, pred: 0.22185 2. 2.
photosynthesis, laser printing
cosS: 0.14425, deg: 0.033772, pred: 0.21813 3. 3.
neutron capture nucleosynthesis, european spallation source
cosS: 0.14137, deg: 0.044866, pred: 0.21189 4. 4.
apparent violation, eberhard inequality
cosS: 0.13047, deg: 0.043491, pred: 0.2102 5. 5.
copenhagen interpretation, spekkens toy model
cosS: 0.14746, deg: 0.043393, pred: 0.20579 6. 6.
shared entanglement, generalized coherence
cosS: 0.1419, deg: 0.035833, pred: 0.20522 7. 7.
quantum search algorithm, oracle query
cosS: 0.14003, deg: 0.043197, pred: 0.20485 8. 8.
photon counter, photonic orbital angular momentum
cosS: 0.14217, deg: 0.04192, pred: 0.20478 9. 9.
copenhagen interpretation, quasi set theory
cosS: 0.1326, deg: 0.040349, pred: 0.20417 10. 10.
optical amplifier, laser printing
cosS: 0.14551, deg: 0.042509, pred: 0.20308
Unrestricted; Highest predicted values:
hybrid system, classical communication
cosS: 0.30407, deg: 0.22924, pred: 1 2. 2.
back action, classical communication
cosS: 0.34642, deg: 0.23012, pred: 0.98235 3. 3.
spin orbit interaction, quantum sensing
cosS: 0.31003, deg: 0.23375, pred: 0.95525 4. 4.
conformal field theory, classical communication
cosS: 0.28176, deg: 0.23493, pred: 0.94893 5. 5.
spin orbit coupling, quantum sensing
cosS: 0.33201, deg: 0.25839, pred: 0.94077 6. 6.
light matter interaction, classical communication
cosS: 0.28623, deg: 0.24769, pred: 0.93416 7. 7.
classical mechanic, classical communication
cosS: 0.3182, deg: 0.24956, pred: 0.92603 8. 8.
universality, weyl semimetal
cosS: 0.44731, deg: 0.30365, pred: 0.90986 9. 9.
many body physic, classical communication
cosS: 0.29946, deg: 0.23414, pred: 0.9079 10. 10.
propagator, weyl semimetal
cosS: 0.44141, deg: 0.30493, pred: 0.88731
Unrestricted; Lowest predicted values:
transverse mode, pseudogap
cosS: 0.47207, deg: 0.22227, pred: -1 2. 2.
nonlinear regime, pseudogap
cosS: 0.48811, deg: 0.21971, pred: -0.99384 3. 3.
langevin equation, pseudogap
cosS: 0.48992, deg: 0.24897, pred: -0.99167 4. 4.
numerical computation, pseudogap
cosS: 0.51088, deg: 0.24357, pred: -0.98443 5. 5.
diffusion process, pseudogap
cosS: 0.51135, deg: 0.21971, pred: -0.98135 6. 6.
interaction hamiltonian, pseudogap
cosS: 0.483, deg: 0.24789, pred: -0.98065 7. 7.
holography, pseudogap
cosS: 0.4797, deg: 0.22413, pred: -0.97841 8. 8.
many particle system, inelastic neutron scattering
cosS: 0.46252, deg: 0.20253, pred: -0.97628 9. 9.
damping rate, pseudogap
cosS: 0.49515, deg: 0.21814, pred: -0.97625 10. 10.
early universe, pseudogap
cosS: 0.42681, deg: 0.21716, pred: -0.9754
cosS0.15; Lowest predicted values:
laser, large helical device
cosS: 0.093823, deg: 0.39378, pred: -0.72391 2. 2.
distribution, pionium
cosS: 0.11835, deg: 0.50461, pred: -0.62882 3. 3.
laser, diffuse serie
cosS: 0.10166, deg: 0.39476, pred: -0.61814 4. 4.
resolution, moseleys law
cosS: 0.075495, deg: 0.38111, pred: -0.60875 5. 5.
charge, franck hertz experiment
cosS: 0.085768, deg: 0.44365, pred: -0.55765 6. 6.
charge, selected area diffraction
cosS: 0.10018, deg: 0.44502, pred: -0.55725 7. 7.
hamiltonian, zero field nmr
cosS: 0.14845, deg: 0.4462, pred: -0.55318 8. 8.
molecule, atom transition
cosS: 0.1266, deg: 0.38386, pred: -0.55074 9. 9.
electron, atom bose einstein condensate
cosS: 0.1139, deg: 0.49146, pred: -0.54915 10. 10.
electron, ultracold atom gas
cosS: 0.12406, deg: 0.49224, pred: -0.54876
Unrestricted; maximal outlier (cosS, deg, pred):
quantum information, scattering amplitude
cosS: 0.49361, deg: 0.5376, pred: -0.95502 2. 2.
s process, quantum spin
cosS: 0.59655, deg: 0.48164, pred: -0.95498 3. 3.
electrostatic, spin system
cosS: 0.58982, deg: 0.45376, pred: -0.95086 4. 4.
hilbert space, raman scattering
cosS: 0.48201, deg: 0.47477, pred: -0.95554 5. 5.
interference effect, mean field theory
cosS: 0.58245, deg: 0.38131, pred: -0.95981 6. 6.
space time, carbon nanotube
cosS: 0.51336, deg: 0.42284, pred: -0.95861 7. 7.
quantum optic, random phase approximation
cosS: 0.48734, deg: 0.43, pred: -0.95878 8. 8.
quantum information, brillouin zone
cosS: 0.50927, deg: 0.52562, pred: -0.86694 9. 9.
two level system, charge density
cosS: 0.51577, deg: 0.41223, pred: -0.95105 10. 10.
path integral, raman scattering
cosS: 0.53407, deg: 0.41331, pred: -0.93953
Unrestricted; maximal outlier (cosS, deg):
hilbert space, plasma
cosS: 0.5505, deg: 0.57157, pred: -0.458 2. 2.
divergence, quantum computation
cosS: 0.56671, deg: 0.53466, pred: 0.063652 3. 3.
wave packet, free energy
cosS: 0.60923, deg: 0.50884, pred: -0.55609 4. 4.
quantum information, wave number
cosS: 0.52858, deg: 0.54683, pred: 0.087118 5. 5.
atom, yang mills theory
cosS: 0.39169, deg: 0.58777, pred: 0.019855 6. 6.
entangled state, conductivity
cosS: 0.50832, deg: 0.54752, pred: -0.45379 7. 7.
density matrix, domain wall
cosS: 0.58721, deg: 0.5105, pred: 0.10296 8. 8.
qubit, diffusion coefficient
cosS: 0.52962, deg: 0.53642, pred: 0.11948 9. 9.
entanglement, vector potential
cosS: 0.50925, deg: 0.54271, pred: 0.11929 10. 10.
decoherence, electromagnetic wave
cosS: 0.5603, deg: 0.51885, pred: 0.095746
IX Network theoretical properties used for predictions
The neural network receives 17 network theoretical properties from SEMNET, which we detail here. For a concept and , the vector corresponds to 17 real valued numbers. SEMNET of a specific year corresponds to an adjacency matrix, which we denote as .
- •
[0,1]: normalized degree centrality of first concept (normalized by largest degree centrality in the concept list), i.e. with how many other concept is connected divided by the connection numbers of the concept with most neighboring concepts.
- •
[0,1], normalized degree centrality of second concept .
- •
[0,1], number of titles and abstract that concept occures (normalized by number of concept that occures in most articles.
- •
[0,1], number of titles and abstract that concept occures (normalized by number of concept that occures in most articles.
- •
[0,1], ratio of common neighbors, also known as cosine similarity.
- •
[0,1], paths of length=2 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=2 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=2 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=3 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=3 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=3 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=4 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=4 between and normalized by pair with largest number of paths, at year .
- •
[0,1], paths of length=4 between and normalized by pair with largest number of paths, at year .
- •
, network distance between and .
- •
[0,1], weighted network distance between and (normalized by largest value of all pairs). Intuition: The more connections between certain edges, the easier it to transition from the one to the other.
- •
[0,1], different normalized weighted network distance between and . Intuition: The more connections between certain edges, the easier it to transition from the one to the other.
X Future suggestions from SEMNET
Here we show a number of future suggestions with different parameter settings. These pairs of concepts are network-theoretically distinguished, and they couldd be inspirations for the creative, human scientist. The concept list used here is unrestricted, meaning not tailored for a specific scientist’s research interest.
X.1 General Concepts
Unrestricted; Highest predicted values:
hybrid system, classical communication
cosS: 0.30407, deg: 0.22924, pred: 1 2. 2.
back action, classical communication
cosS: 0.34642, deg: 0.23012, pred: 0.98235 3. 3.
spin orbit interaction, quantum sensing
cosS: 0.31003, deg: 0.23375, pred: 0.95525 4. 4.
conformal field theory, classical communication
cosS: 0.28176, deg: 0.23493, pred: 0.94893 5. 5.
spin orbit coupling, quantum sensing
cosS: 0.33201, deg: 0.25839, pred: 0.94077 6. 6.
light matter interaction, classical communication
cosS: 0.28623, deg: 0.24769, pred: 0.93416 7. 7.
classical mechanic, classical communication
cosS: 0.3182, deg: 0.24956, pred: 0.92603 8. 8.
universality, weyl semimetal
cosS: 0.44731, deg: 0.30365, pred: 0.90986 9. 9.
many body physic, classical communication
cosS: 0.29946, deg: 0.23414, pred: 0.9079 10. 10.
propagator, weyl semimetal
cosS: 0.44141, deg: 0.30493, pred: 0.88731
cosS0.15; Highest predicted values:
molecule, stanene
cosS: 0.14975, deg: 0.38553, pred: 0.87155 2. 2.
wave function, stanene
cosS: 0.14554, deg: 0.41675, pred: 0.85192 3. 3.
ground state, laser printing
cosS: 0.080176, deg: 0.43108, pred: 0.79129 4. 4.
laser, stanene
cosS: 0.14711, deg: 0.39918, pred: 0.73576 5. 5.
spin state, rarita schwinger equation
cosS: 0.10752, deg: 0.25182, pred: 0.73427 6. 6.
two level atom, ultracold atom gas
cosS: 0.14962, deg: 0.20833, pred: 0.71826 7. 7.
correlation, laser printing
cosS: 0.076358, deg: 0.47497, pred: 0.71787 8. 8.
optical lattice, electromagnetically induced grating
cosS: 0.12275, deg: 0.24917, pred: 0.71311 9. 9.
polarization, laser printing
cosS: 0.083372, deg: 0.42666, pred: 0.71008 10. 10.
wave function, laser printing
cosS: 0.082139, deg: 0.41086, pred: 0.70284
deg0.05; Highest predicted values:
seesaw mechanism, dark photon
cosS: 0.42051, deg: 0.046927, pred: 0.52255 2. 2.
majoron, tribimaximal mixing
cosS: 0.43699, deg: 0.026998, pred: 0.4697 3. 3.
matrix product operator, multi scale entanglement renormalization ansatz
cosS: 0.367, deg: 0.044375, pred: 0.45618 4. 4.
electron neutrino, tribimaximal mixing
cosS: 0.32507, deg: 0.047222, pred: 0.45098 5. 5.
valleytronic, spin transistor
cosS: 0.39342, deg: 0.043687, pred: 0.43787 6. 6.
fair sampling, bell test experiment
cosS: 0.38788, deg: 0.018751, pred: 0.4309 7. 7.
dark photon, little hierarchy problem
cosS: 0.4419, deg: 0.026311, pred: 0.4296 8. 8.
wiggler, smith purcell effect
cosS: 0.26696, deg: 0.042411, pred: 0.42564 9. 9.
valleytronic, spatial inversion
cosS: 0.34483, deg: 0.043982, pred: 0.41915 10. 10.
quantum key, continuous variable quantum cryptography
cosS: 0.28986, deg: 0.044375, pred: 0.41585
cosS0.15, deg0.05; Highest predicted values:
self pulsing, laser printing
cosS: 0.13666, deg: 0.028176, pred: 0.22185 2. 2.
photosynthesis, laser printing
cosS: 0.14425, deg: 0.033772, pred: 0.21813 3. 3.
neutron capture nucleosynthesis, european spallation source
cosS: 0.14137, deg: 0.044866, pred: 0.21189 4. 4.
apparent violation, eberhard inequality
cosS: 0.13047, deg: 0.043491, pred: 0.2102 5. 5.
copenhagen interpretation, spekkens toy model
cosS: 0.14746, deg: 0.043393, pred: 0.20579 6. 6.
shared entanglement, generalized coherence
cosS: 0.1419, deg: 0.035833, pred: 0.20522 7. 7.
quantum search algorithm, oracle query
cosS: 0.14003, deg: 0.043197, pred: 0.20485 8. 8.
photon counter, photonic orbital angular momentum
cosS: 0.14217, deg: 0.04192, pred: 0.20478 9. 9.
copenhagen interpretation, quasi set theory
cosS: 0.1326, deg: 0.040349, pred: 0.20417 10. 10.
optical amplifier, laser printing
cosS: 0.14551, deg: 0.042509, pred: 0.20308
Unrestricted; Highest predicted values:
hybrid system, classical communication
cosS: 0.30407, deg: 0.22924, pred: 1 2. 2.
back action, classical communication
cosS: 0.34642, deg: 0.23012, pred: 0.98235 3. 3.
spin orbit interaction, quantum sensing
cosS: 0.31003, deg: 0.23375, pred: 0.95525 4. 4.
conformal field theory, classical communication
cosS: 0.28176, deg: 0.23493, pred: 0.94893 5. 5.
spin orbit coupling, quantum sensing
cosS: 0.33201, deg: 0.25839, pred: 0.94077 6. 6.
light matter interaction, classical communication
cosS: 0.28623, deg: 0.24769, pred: 0.93416 7. 7.
classical mechanic, classical communication
cosS: 0.3182, deg: 0.24956, pred: 0.92603 8. 8.
universality, weyl semimetal
cosS: 0.44731, deg: 0.30365, pred: 0.90986 9. 9.
many body physic, classical communication
cosS: 0.29946, deg: 0.23414, pred: 0.9079 10. 10.
propagator, weyl semimetal
cosS: 0.44141, deg: 0.30493, pred: 0.88731
Unrestricted; Lowest predicted values:
transverse mode, pseudogap
cosS: 0.47207, deg: 0.22227, pred: -1 2. 2.
nonlinear regime, pseudogap
cosS: 0.48811, deg: 0.21971, pred: -0.99384 3. 3.
langevin equation, pseudogap
cosS: 0.48992, deg: 0.24897, pred: -0.99167 4. 4.
numerical computation, pseudogap
cosS: 0.51088, deg: 0.24357, pred: -0.98443 5. 5.
diffusion process, pseudogap
cosS: 0.51135, deg: 0.21971, pred: -0.98135 6. 6.
interaction hamiltonian, pseudogap
cosS: 0.483, deg: 0.24789, pred: -0.98065 7. 7.
holography, pseudogap
cosS: 0.4797, deg: 0.22413, pred: -0.97841 8. 8.
many particle system, inelastic neutron scattering
cosS: 0.46252, deg: 0.20253, pred: -0.97628 9. 9.
damping rate, pseudogap
cosS: 0.49515, deg: 0.21814, pred: -0.97625 10. 10.
early universe, pseudogap
cosS: 0.42681, deg: 0.21716, pred: -0.9754
cosS0.15; Lowest predicted values:
laser, large helical device
cosS: 0.093823, deg: 0.39378, pred: -0.72391 2. 2.
distribution, pionium
cosS: 0.11835, deg: 0.50461, pred: -0.62882 3. 3.
laser, diffuse serie
cosS: 0.10166, deg: 0.39476, pred: -0.61814 4. 4.
resolution, moseleys law
cosS: 0.075495, deg: 0.38111, pred: -0.60875 5. 5.
charge, franck hertz experiment
cosS: 0.085768, deg: 0.44365, pred: -0.55765 6. 6.
charge, selected area diffraction
cosS: 0.10018, deg: 0.44502, pred: -0.55725 7. 7.
hamiltonian, zero field nmr
cosS: 0.14845, deg: 0.4462, pred: -0.55318 8. 8.
molecule, atom transition
cosS: 0.1266, deg: 0.38386, pred: -0.55074 9. 9.
electron, atom bose einstein condensate
cosS: 0.1139, deg: 0.49146, pred: -0.54915 10. 10.
electron, ultracold atom gas
cosS: 0.12406, deg: 0.49224, pred: -0.54876
Unrestricted; maximal outlier (cosS, deg, pred):
quantum information, scattering amplitude
cosS: 0.49361, deg: 0.5376, pred: -0.95502 2. 2.
s process, quantum spin
cosS: 0.59655, deg: 0.48164, pred: -0.95498 3. 3.
electrostatic, spin system
cosS: 0.58982, deg: 0.45376, pred: -0.95086 4. 4.
hilbert space, raman scattering
cosS: 0.48201, deg: 0.47477, pred: -0.95554 5. 5.
interference effect, mean field theory
cosS: 0.58245, deg: 0.38131, pred: -0.95981 6. 6.
space time, carbon nanotube
cosS: 0.51336, deg: 0.42284, pred: -0.95861 7. 7.
quantum optic, random phase approximation
cosS: 0.48734, deg: 0.43, pred: -0.95878 8. 8.
quantum information, brillouin zone
cosS: 0.50927, deg: 0.52562, pred: -0.86694 9. 9.
two level system, charge density
cosS: 0.51577, deg: 0.41223, pred: -0.95105 10. 10.
path integral, raman scattering
cosS: 0.53407, deg: 0.41331, pred: -0.93953
Unrestricted; maximal outlier (cosS, deg):
hilbert space, plasma
cosS: 0.5505, deg: 0.57157, pred: -0.458 2. 2.
divergence, quantum computation
cosS: 0.56671, deg: 0.53466, pred: 0.063652 3. 3.
wave packet, free energy
cosS: 0.60923, deg: 0.50884, pred: -0.55609 4. 4.
quantum information, wave number
cosS: 0.52858, deg: 0.54683, pred: 0.087118 5. 5.
atom, yang mills theory
cosS: 0.39169, deg: 0.58777, pred: 0.019855 6. 6.
entangled state, conductivity
cosS: 0.50832, deg: 0.54752, pred: -0.45379 7. 7.
density matrix, domain wall
cosS: 0.58721, deg: 0.5105, pred: 0.10296 8. 8.
qubit, diffusion coefficient
cosS: 0.52962, deg: 0.53642, pred: 0.11948 9. 9.
entanglement, vector potential
cosS: 0.50925, deg: 0.54271, pred: 0.11929 10. 10.
decoherence, electromagnetic wave
cosS: 0.5603, deg: 0.51885, pred: 0.095746
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J.A. Evans and A. Rzhetsky, Advancing science through mining libraries, ontologies, and communities. Journal of Biological Chemistry 286 , 23659–23666 (2011).
- 2[2] J. You, Darpa sets out to automate research. Science 347 , 465 (2015).
- 3[3] F. Lehmann, Semantic networks in artificial intelligence. (Elsevier Science Inc., 1992).
- 4[4] J.A. Evans and J.G. Foster, Metaknowledge. Science 331 , 721–725 (2011).
- 5[5] A. Zeng, Z. Shen, J. Zhou, J. Wu, Y. Fan, Y. Wang and H.E. Stanley, The science of science: From the perspective of complex systems. Physics Reports 714 , 1–73 (2017).
- 6[6] S. Fortunato, C.T. Bergstrom, K. Börner, J.A. Evans, D. Helbing, S. Milojević, A.M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi and others, Science of science. Science 359 , eaao 0185 (2018).
- 7[7] J.G. Foster, A. Rzhetsky and J.A. Evans, Tradition and innovation in scientists’ research strategies. American Sociological Review 80 , 875–908 (2015).
- 8[8] A. Rzhetsky, J.G. Foster, I.T. Foster and J.A. Evans, Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences 112 , 14569–14574 (2015).
