TL;DR
This survey reviews embedding models for knowledge graph completion, summarizing recent experimental results and highlighting future research directions to improve link prediction accuracy.
Contribution
It provides a comprehensive overview of current embedding models for entities and relationships in knowledge graphs, including experimental comparisons and future research insights.
Findings
Embedding models vary in effectiveness across datasets
Recent models achieve higher accuracy in link prediction
Future research should focus on model scalability and interpretability
Abstract
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.
| Model | Score function | |
|---|---|---|
| Translation | Unstructured | |
| SE | where , | |
| TransE | where | |
| TransH | where , , I denotes an identity matrix size | |
| TransR | where , | |
| STransE | where , , | |
| TranSparse | where , ; , ; | |
| TransD | where , , | |
| lppTransD | where , , , | |
| Bilinear & Tensor | Bilinear | where |
| DISTMULT | where is a diagonal matrix | |
| SimplE | ( + ) where ; and are diagonal matrices | |
| SME(bilinear) | where ; | |
| TuckER | where , ; denotes the tensor product along the -th mode | |
| HolE | where | |
| Neural network | NTN | where ; ; , |
| ER-MLP | ||
| ConvE | ||
| ConvKB | ||
| Complex vector | ComplEx | where denotes the real part of the complex value |
| ; is a diagonal matrix ; is the conjugate of | ||
| RotatE | where ; denotes the element-wise product | |
| QuatE | where ; and denote Hamilton and quaternion inner products, respectively | |
| Path | TransE-comp | where |
| Bilinear-comp | where | |
| Dataset | #Triples in train/valid/test | ||||
|---|---|---|---|---|---|
| FB15k [Bordes et al., 2013] | 14,951 | 1,345 | 483,142 | 50,000 | 59,071 |
| WN18 [Bordes et al., 2013] | 40,943 | 18 | 141,442 | 5,000 | 5,000 |
| FB15k-237 [Toutanova and Chen, 2015] | 14,541 | 237 | 272,115 | 17,535 | 20,466 |
| WN18RR ?) | 40,943 | 11 | 86,835 | 3,034 | 3,134 |
| Method | Filtered | |||||
| FB15k | WN18 | |||||
| MR | @10 | MRR | MR | @10 | MRR | |
| TransH [Wang et al., 2014] | 87 | 64.4 | - | 303 | 86.7 | - |
| TransR [Lin et al., 2015b] | 77 | 68.7 | - | 225 | 92.0 | - |
| CTransR [Lin et al., 2015b] | 75 | 70.2 | - | 218 | 92.3 | - |
| KG2E [He et al., 2015] | 59 | 74.0 | - | 331 | 92.8 | - |
| TransD [Ji et al., 2015] | 91 | 77.3 | - | 212 | 92.2 | - |
| lppTransD [Yoon et al., 2016] | 78 | 78.7 | - | 270 | 94.3 | - |
| TransG [Xiao et al., 2016] | 98 | 79.8 | - | 470 | 93.3 | - |
| TranSparse [Ji et al., 2016] | 82 | 79.5 | - | 211 | 93.2 | - |
| TranSparse-DT [Chang et al., 2017] | 79 | 80.2 | - | 221 | 94.3 | - |
| ITransF [Xie et al., 2017] | 65 | 81.0 | - | 205 | 94.2 | - |
| NTN [Socher et al., 2013] [] | - | 41.4 | 0.25 | - | 66.1 | 0.53 |
| TransE [Bordes et al., 2013] [] | - | 74.9 | 0.463 | - | 94.3 | 0.495 |
| HolE [Nickel et al., 2016b] | - | 73.9 | 0.524 | - | 94.9 | 0.938 |
| ComplEx [Trouillon et al., 2016] | - | 84.0 | 0.692 | - | 94.7 | 0.941 |
| ANALOGY [Liu et al., 2017] | - | 85.4 | 0.725 | - | 94.7 | 0.942 |
| SimplE [Kazemi and Poole, 2018] | - | 83.8 | 0.727 | - | 94.7 | 0.942 |
| TorusE [Ebisu and Ichise, 2018] | - | 83.2 | 0.733 | - | 95.4 | 0.947 |
| STransE [Nguyen et al., 2016a] | 69 | 79.7 | 0.543 | 206 | 93.4 | 0.657 |
| ER-MLP [Dong et al., 2014] [] | 81 | 80.1 | 0.570 | 299 | 94.2 | 0.895 |
| DISTMULT [Yang et al., 2015] [] | 42 | 89.3 | 0.798 | 655 | 94.6 | 0.797 |
| ConvE [Dettmers et al., 2018] | 64 | 87.3 | 0.745 | 504 | 95.5 | 0.942 |
| HypER [Balažević et al., 2019] | 44 | 88.5 | 0.790 | 431 | 95.8 | 0.951 |
| RotatE [Sun et al., 2019] | 40 | 88.4 | 0.797 | 309 | 95.9 | 0.949 |
| QuatE [Zhang et al., 2019] | 17 | 90.0 | 0.782 | 162 | 95.9 | 0.950 |
| ComplEx-N3 [Lacroix et al., 2018] | - | 91 | 0.86 | - | 96 | 0.95 |
| TuckER [Balazevic et al., 2019] | - | 89.2 | 0.795 | - | 95.8 | 0.953 |
| IRN [Shen et al., 2017] | 38 | 92.7 | - | 249 | 95.3 | - |
| ProjE [Shi and Weninger, 2017] | 34 | 88.4 | - | - | - | - |
| rTransE [García-Durán et al., 2015] | 50 | 76.2 | - | - | - | - |
| PTransE-ADD [Lin et al., 2015a] | 58 | 84.6 | - | - | - | - |
| PTransE-RNN [Lin et al., 2015a] | 92 | 82.2 | - | - | - | - |
| GAKE [Feng et al., 2016b] | 119 | 64.8 | - | - | - | - |
| Gaifman [Niepert, 2016] | 75 | 84.2 | - | 352 | 93.9 | - |
| Hiri [Liu et al., 2016] | - | 70.3 | 0.603 | - | 90.8 | 0.691 |
| Neural LP [Yang et al., 2017] | - | 83.7 | 0.76 | - | 94.5 | 0.94 |
| R-GCN+ [Schlichtkrull et al., 2018] | - | 84.2 | 0.696 | - | 96.4 | 0.819 |
| KBLRN [Durán and Niepert, 2018] | 44 | 87.5 | 0.794 | - | - | - |
| TEKE_H [Wang and Li, 2016] | 108 | 73.0 | - | 114 | 92.9 | - |
| SSP [Xiao et al., 2017] | 82 | 79.0 | - | 156 | 93.2 | - |
| Method | Filtered | |||||
| FB15k-237 | WN18RR | |||||
| MR | @10 | MRR | MR | @10 | MRR | |
| IRN [Shen et al., 2017] | 211 | 46.4 | - | - | - | - |
| KBGAN [Cai and Wang, 2018] | - | 45.8 | 0.278 | - | 48.1 | 0.213 |
| DISTMULT [Yang et al., 2015] [] | 254 | 41.9 | 0.241 | 5110 | 49 | 0.43 |
| ComplEx [Trouillon et al., 2016] [] | 339 | 42.8 | 0.247 | 5261 | 51 | 0.44 |
| ConvE [Dettmers et al., 2018] | 246 | 49.1 | 0.316 | 5277 | 48 | 0.46 |
| ER-MLP [Dong et al., 2014] [] | 219 | 54.0 | 0.342 | 4798 | 41.9 | 0.366 |
| HypER [Balažević et al., 2019] | 250 | 52.0 | 0.341 | 5798 | 52.2 | 0.465 |
| TransE [Bordes et al., 2013] [] | 347 | 46.5 | 0.294 | 743 | 56.0 | 0.245 |
| ConvKB [Nguyen et al., 2018] [] | 254 | 53.2 | 0.418 | 763 | 56.7 | 0.253 |
| CapsE [Nguyen et al., 2019b] | 303 | 59.3 | 0.523 | 719 | 56.0 | 0.415 |
| InteractE [Vashishth et al., 2020] | 172 | 53.5 | 0.354 | 5202 | 52.8 | 0.463 |
| RotatE [Sun et al., 2019] | 177 | 53.3 | 0.338 | 3340 | 57.1 | 0.476 |
| QuatE [Zhang et al., 2019] | 87 | 55.0 | 0.348 | 2314 | 58.2 | 0.488 |
| ComplEx-N3 [Lacroix et al., 2018] | - | 56 | 0.37 | - | 57 | 0.48 |
| Conv-TransE [Shang et al., 2019] | - | 51 | 0.33 | - | 52 | 0.46 |
| TuckER [Balazevic et al., 2019] | - | 54.4 | 0.358 | - | 52.6 | 0.470 |
| Neural LP [Yang et al., 2017] | - | 36.2 | 0.24 | - | - | - |
| R-GCN+ [Schlichtkrull et al., 2018] | - | 41.7 | 0.249 | - | - | - |
| KBLRN [Durán and Niepert, 2018] | 209 | 49.3 | 0.309 | - | - | - |
| KBGAT [Nathani et al., 2019] | 210 | 62.6 | 0.518 | 1940 | 58.1 | 0.440 |
| ReInceptionE [Xie et al., 2020] | 173 | 52.8 | 0.349 | 1894 | 58.2 | 0.483 |
| SACN [Shang et al., 2019] | - | 54 | 0.35 | - | 54 | 0.47 |
| Dataset | #Triples in train/valid/test | ||||
|---|---|---|---|---|---|
| FB13 [Socher et al., 2013] | 75,043 | 13 | 316,232 | 5,908 | 23,733 |
| WN11 [Socher et al., 2013] | 38,696 | 11 | 112,581 | 2,609 | 10,544 |
| Method | W11 | F13 | Avg. |
|---|---|---|---|
| CTransR [Lin et al., 2015b] | 85.7 | - | - |
| TransR [Lin et al., 2015b] | 85.9 | 82.5 | 84.2 |
| TransD [Ji et al., 2015] | 86.4 | 89.1 | 87.8 |
| TEKE_H [Wang and Li, 2016] | 84.8 | 84.2 | 84.5 |
| TranSparse-S [Ji et al., 2016] | 86.4 | 88.2 | 87.3 |
| TranSparse-US [Ji et al., 2016] | 86.8 | 87.5 | 87.2 |
| ConvKB [Nguyen et al., 2018] [*] | 87.6 | 88.8 | 88.2 |
| TransE-HRS [Zhang et al., 2018] | 86.8 | 88.4 | 87.6 |
| DISTMULT-HRS [Zhang et al., 2018] | 88.9 | 89.0 | 89.0 |
| NTN [Socher et al., 2013] | 70.6 | 87.2 | 78.9 |
| TransH [Wang et al., 2014] | 78.8 | 83.3 | 81.1 |
| SLogAn [Liang and Forbus, 2015] | 75.3 | 85.3 | 80.3 |
| KG2E [He et al., 2015] | 85.4 | 85.3 | 85.4 |
| Bilinear-comp [Guu et al., 2015] | 77.6 | 86.1 | 81.9 |
| TransE-comp [Guu et al., 2015] | 80.3 | 87.6 | 84.0 |
| TransR-FT [Feng et al., 2016a] | 86.6 | 82.9 | 84.8 |
| TransG [Xiao et al., 2016] | 87.4 | 87.3 | 87.4 |
| lppTransD [Yoon et al., 2016] | 86.2 | 88.6 | 87.4 |
| TransE [Bordes et al., 2013] [*] | 86.5 | 87.5 | 87.0 |
| TransE-NMM [Nguyen et al., 2016b] | 86.8 | 88.6 | 87.7 |
| TranSparse-DT [Chang et al., 2017] | 87.1 | 87.9 | 87.5 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A survey of embedding models of entities and relationships
for knowledge graph completion
Dat Quoc Nguyen
VinAI Research, Vietnam
Abstract
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.
Keywords: Knowledge graph completion, Link prediction, Embedding model, Entity prediction.
1 Introduction
Let us revisit the classic Word2Vec example of a “royal” relationship between “” and “”, and between “” and “”. As illustrated in this example: , word vectors learned from a large corpus can model relational similarities or linguistic regularities between pairs of words as translations in the projected vector space [Mikolov et al., 2013, Pennington et al., 2014]. Figure 2 shows another example of a relational similarity between word pairs of countries and capital cities:
[TABLE]
Assume that we consider the country and capital pairs in Figure 2 to be pairs of entities rather than word types. That is, we now represent country and capital entities by low-dimensional and dense vectors. The relational similarity between word pairs is presumably to capture a “” relationship between country and capital entities. Also, we represent this relationship by a translation vector in the entity vector space. Thus, we expect:
[TABLE]
This intuition inspired the TransE model—a well-known embedding model for KG completion or link prediction in KGs [Bordes et al., 2013].
Knowledge graphs are collections of real-world triples, where each triple or fact in KGs represents some relation between a head entity and a tail entity . KGs can thus be formalized as directed multi-relational graphs, where nodes correspond to entities and edges linking the nodes encode various kinds of relationships [García-Durán et al., 2016, Nickel et al., 2016a]. Here entities are real-world things or objects such as persons, places, organizations, music tracks or movies. Each relation type defines a certain relationship between entities. For example, as illustrated in Figure 2, the relation type “” relates person entities with each other, while the relation type “” relates person entities with place entities. Several KG examples include the domain-specific KG GeneOntology and popular generic KGs of WordNet [Fellbaum, 1998], YAGO [Suchanek et al., 2007], Freebase [Bollacker et al., 2008], NELL [Carlson et al., 2010] and DBpedia [Lehmann et al., 2015] as well as commercial KGs such as Google’s Knowledge Graph, Microsoft’s Satori and Facebook’s Open Graph. Nowadays, KGs are used in a number of commercial applications including search engines such as Google, Microsoft’s Bing and Facebook’s Graph search. They also are useful resources for many natural language processing tasks such as question answering [Ferrucci, 2012, Fader et al., 2014], word sense disambiguation [Navigli and Velardi, 2005, Agirre et al., 2013], semantic parsing [Krishnamurthy and Mitchell, 2012, Berant et al., 2013] and co-reference resolution [Ponzetto and Strube, 2006, Dutta and Weikum, 2015].
A main issue is that even very large KGs, such as Freebase and DBpedia, which contain billions of fact triples about the world, are still far from complete. In particular, in English DBpedia 2014, 60% of person entities miss a place of birth and 58% of the scientists do not have a fact about what they are known for [Krompaß et al., 2015]. In Freebase, 71% of 3 million person entities miss a place of birth, 75% do not have a nationality while 94% have no facts about their parents [West et al., 2014]. So, in terms of a specific application, question answering systems based on incomplete KGs would not provide a correct answer given a correctly interpreted question. For example, given the incomplete KG in Figure 2, it would be impossible to answer the question “where was Jane born ?”, although the question is completely matched with existing entity and relation type information (i.e. “” and “”) in KG. Consequently, much work has been devoted towards knowledge graph completion to perform link prediction in KGs, which attempts to predict whether a relationship/triple not in the KG is likely to be true, i.e. to add new triples by leveraging existing triples in the KG [Lao and Cohen, 2010, Bordes et al., 2012, Gardner et al., 2014, García-Durán et al., 2016]. For example, we would like to predict the missing tail entity in the incomplete triple or predict whether the triple is correct or not.
Embedding models for KG completion have been proven to give state-of-the-art link prediction performances, in which entities are represented by latent feature vectors while relation types are represented by latent feature vectors and/or matrices and/or third-order tensors [Bordes et al., 2013, Socher et al., 2013]. This paper: (1) surveys the embedding models for KG completion, then (2) summarizes up-to-date experimental results on the standard evaluation task of entity prediction—which is also referred to as the link prediction task [Bordes et al., 2013], and (3) points out potential future research directions.
2 A General Approach of Embedding Models for KG Completion
Let denote the set of entities and the set of relation types. Denote by the knowledge graph consisting of a set of correct triples , such that and . For each triple , the embedding models define a score function of its plausibility. Their goal here is to:
Choose such that the score of a correct triple is higher than the score of an incorrect triple .
For example, TransE defines a score function of , where , and are represented by low dimensional vectors , and , respectively. As is a correct triple, while and are incorrect ones, we would have: , and . Table 1 in Section 3 summarizes different prominent score functions .
To learn model parameters (i.e. entity vectors, relation vectors or matrices), the embedding models minimize an objective loss . A conventional objective loss is the margin-based pairwise ranking loss [Bordes et al., 2013]:
[TABLE]
where ; is the margin hyper-parameter; and is the set of incorrect triples generated by corrupting the correct triple .
Also, the negative log-likelihood (NLL) of softmax regression [Toutanova and Chen, 2015] and the NLL of logistic regression [Trouillon et al., 2016] are commonly used in recent KG completion research:111All the losses can also include an L2 regularization on the model parameters, which is not shown for simplification.
[TABLE]
To corrupt the head or tail entities, a common strategy is to uniformly replace the entities when sampling incorrect triples [Bordes et al., 2013], however it results in many false negative labels [Wang et al., 2014]. Domain sampling [Krompaß et al., 2015, Xie et al., 2017] generates corrupted triples by sampling entities from the same domain or from the set of relation-dependent entities. The “Bernoulli” trick [Wang et al., 2014] is widely used to set different probabilities for generating head or tail entities: For each relation type , we calculate the averaged number of heads for a pair and the averaged number of tails for a pair . We then define a Bernoulli distribution with success probability for sampling: given a correct triple , we corrupt this triple by replacing head entity with probability while replacing the tail entity with probability .
Recently, ?) and ?) proposed adversarial learning-based strategies for sampling incorrect triples. However, they did not provide a comparison between the adversarial learning-based strategies and the “Bernoulli” trick.
3 Specific Models
3.1 Triple-based Embedding Models
Translation-based models:
The Unstructured model [Bordes et al., 2012] assumes that the head and tail entity vectors are similar. As the Unstructured model does not take the relationship into account, it cannot distinguish different relation types. The Structured Embedding (SE) model [Bordes et al., 2011] assumes that the head and tail entities are similar only in a relation-dependent subspace, where each relation is represented by two different matrices. TransE [Bordes et al., 2013] is inspired by models such as the Word2Vec Skip-gram model [Mikolov et al., 2013] where relationships between words often correspond to translations in latent feature space. In particular, TransE learns low-dimensional and dense vectors for every entity and relation type, so that each relation type corresponds to a translation vector operating on the vectors representing the entities, i.e. for each fact triple . TransE thus is suitable for 1-to-1 relationships, such as “”, where a head entity is linked to at most one tail entity given a relation type. Because of using only one translation vector to represent each relation type, TransE is not well-suited for Many-to-1, 1-to-Many and Many-to-Many relationships,222A relation type is classified Many-to-1 if multiple head entities can be connected by to at most one tail entity. A relation type is classified 1-to-Many if multiple tail entities can be linked by from at most one head entity. A relation type is classified Many-to-Many if multiple head entities can be connected by to a tail entity and vice versa. such as for relation types “”, “” and “.” For example in Figure 2, using one vector representing the relation type “” cannot capture both the translating direction from “” to “” and its inverse direction from “” to “.”
To overcome those issues of TransE, TransH [Wang et al., 2014] associates each relation with a relation-specific hyperplane and uses a projection vector to project entity vectors onto that hyperplane. TransD [Ji et al., 2015] and TransR/CTransR [Lin et al., 2015b] extend TransH by using two projection vectors and a matrix to project entity vectors into a relation-specific space, respectively. Similar to TransR, TransR-FT [Feng et al., 2016a] also uses a matrix to project head and tail entity vectors. TEKE_H [Wang and Li, 2016] extends TransH to incorporate rich context information in an external text corpus. lppTransD [Yoon et al., 2016] extends TransD to additionally use two projection vectors for representing each relation. STransE [Nguyen et al., 2016a] and TranSparse [Ji et al., 2016] can be viewed as direct extensions of TransR, where head and tail entities are associated with their own projection matrices. Unlike STransE, TranSparse uses adaptive sparse matrices, whose sparse degrees are defined based on the number of entities linked by relations. TranSparse-DT [Chang et al., 2017] is an extension of TranSparse with a dynamic translation. ITransF [Xie et al., 2017] can be considered as a generalization of STransE, which allows the sharing of statistic regularities between relation projection matrices and alleviates data sparsity issue. Furthermore, TorusE [Ebisu and Ichise, 2018] embeds entities and relations on a torus to handle TransE’s regularization problem which forces entity embeddings to be on a sphere in the embedding vector space.
Bilinear- & Tensor-based models:
DISTMULT [Yang et al., 2015] is based on the Bilinear model [Nickel et al., 2011, Jenatton et al., 2012] where each relation is represented by a diagonal matrix rather than a full matrix. SimplE [Kazemi and Poole, 2018] extends DISTMULT to allow two embeddings of each entity to be learned dependently. Such quadratic forms are also used to model entities and relations in KG2E [He et al., 2015], TATEC [García-Durán et al., 2016], TransG [Xiao et al., 2016], RSTE [Tay et al., 2017], ANALOGY [Liu et al., 2017] and Dihedral [Xu and Li, 2019]. SME-bilinear [Bordes et al., 2012] is proposed to first separately combine entity-relation pairs and and then semantically match these combinations, using tensor product. HolE [Nickel et al., 2016b] uses circular correlation–a compositional operator–which can be interpreted as a compression of the tensor product. In addition, TuckER [Balazevic et al., 2019] is a linear model based on the Tucker tensor decomposition of the binary tensor representation of KG triples.
Neural network-based models:
The neural tensor network (NTN) model [Socher et al., 2013] also uses a bilinear tensor operator to represent each relation while ProjE [Shi and Weninger, 2017] can be viewed as simplified versions of NTN. The ER-MLP model [Dong et al., 2014] represents each triple by a vector obtained from concatenating head, relation and tail embeddings, then feeds this vector into a single-layer MLP with one-node output layer. ConvE [Dettmers et al., 2018] and ConvKB [Nguyen et al., 2018] are based on convolutional neural networks. ConvE uses a convolution layer directly over 2D reshaping of head-entity and relation embeddings, while ConvKB applies a convolution layer over the embedding triples (here each triple is represented as a 3-column matrix where each column vector represents a triple element). HypER [Balažević et al., 2019] simplifies ConvE by using a hypernetwork to produce 1D convolutional filters for each relation, then extracts relation-specific features from head entity embeddings. Conv-TransE [Shang et al., 2019] extends ConvE to keep the translational characteristic between entities and relations. InteractE [Vashishth et al., 2020] uses a circular convolution operator and a checkered reshaping function instead of the standard convolution operator and 2D stack reshaping function in ConvE. The CapsE model [Nguyen et al., 2019b] extends ConvKB by stacking a capsule network layer [Sabour et al., 2017] on top of the convolution layer.
Complex vector-based models:
Instead of embedding entities and relations in the real-valued vector space, ComplEx [Trouillon et al., 2016] is an extension of DISTMULT in the complex vector space. ComplEx-N3 [Lacroix et al., 2018] extends ComplEx with weighted nuclear 3-norm. Also in the complex vector space, RotatE [Sun et al., 2019] defines each relation as a rotation from the head entity to the tail entity. QuatE [Zhang et al., 2019] represents entities by quaternion embeddings (i.e. hypercomplex-valued embeddings) and models relations as rotations in the quaternion space by employing the Hamilton and quaternion-inner products.
3.2 Relation Path-based Embedding Models
All embedding models mentioned above in Section 3.1 only take triples into account. Thus, these models ignore potentially useful information implicitly presented by the structure of the KG. For example, the relation path should indicate a relationship “” between the and entities. Also, neighborhood information of entities could be useful for predicting the relationship between two entities as well. For example, in the KG NELL [Carlson et al., 2010], we have information such as if a person works for an organization and this person also leads that organization, then it is likely that this person is the CEO of that organization.
Recent research has also shown that relation paths between entities in KGs provide richer context information and improve the performance of embedding models for KG completion [Luo et al., 2015, Liang and Forbus, 2015, García-Durán et al., 2015, Guu et al., 2015, Toutanova et al., 2016, Durán and Niepert, 2018, Takahashi et al., 2018, Chen et al., 2018]. In particular, ?) constructed relation paths between entities and, viewing entities and relations in the path as pseudo-words, then applied Word2Vec [Mikolov et al., 2013] to produce pre-trained vectors for these pseudo-words. ?) showed that using these pre-trained vectors for initialization helps to improve the performance of models TransE [Bordes et al., 2013], SME [Bordes et al., 2012] and SE [Bordes et al., 2011]. ?) used the plausibility score produced by SME to compute the weights of relation paths.
PTransE-RNN [Lin et al., 2015a] models relation paths by using a recurrent neural network (RNN). In addition, ?)’s model and ROPs [Yin et al., 2018] also apply RNN to model the path between an entity pair, however, in contrast to PTransE-RNN, they additionally take the intermediate entities present in the path into account. IRN [Shen et al., 2017] uses a shared memory and RNN-based controller to implicitly model multi-step structured relationships. rTransE [García-Durán et al., 2015], PTransE-ADD [Lin et al., 2015a] and TransE-comp [Guu et al., 2015] extend TransE to represent a relation path by a vector which is the sum of the vectors of all relations in the path. In Bilinear-comp [Guu et al., 2015] and pruned-paths [Toutanova et al., 2016], each relation is a matrix and so it represents the relation path by matrix multiplication. ?) proposed the KBLRN framework to combine relational paths with latent and numerical features.
The neighborhood mixture model TransE-NMM [Nguyen et al., 2016b] can be also viewed as a three-relation path model because it takes into account the neighborhood entity and relation information of both head and tail entities in each triple. ReInceptionE [Xie et al., 2020] employs the Inception network [Szegedy et al., 2016] to increase the interactions between head and relation embeddings for obtaining better representations of the head and relation pairs and then uses a relation-aware attention mechanism to enrich these pair representations with the local neighborhood and global entity information. Neighborhood information is also exploited in R-GCN [Schlichtkrull et al., 2018], SACN [Shang et al., 2019] and KBGAT [Nathani et al., 2019], which generalize graph convolutional networks [Kipf and Welling, 2017] and graph attention networks [Veličković et al., 2018] for dealing with highly multi-relational data, e.g. KGs. For computing the final representation of an entity, they make use of layer-wise propagation to accumulate linearly-transformed embeddings of its neighboring entities through a normalized sum with different relational weights. For link prediction, R-GCN, SACN and KBGAT apply DISTMULT, Conv-TransE and ConvKB to compute triple scores, respectively.
3.3 Other KG Completion Models
The Path Ranking Algorithm (PRA) [Lao and Cohen, 2010] is a random walk inference technique which was proposed to predict a new relationship between two entities in KGs. ?) used PRA to estimate the probability of an unseen triple as a combination of weighted random walks that follow different paths linking the head entity and tail entity in the KG. ?) made use of an external text corpus to increase the connectivity of the KG used as the input to PRA. ?) improved PRA by proposing a subgraph feature extraction technique to make the generation of random walks in KGs more efficient and expressive, while ?) extended PRA to couple the path ranking of multiple relations. PRA can also be used in conjunction with first-order logic in the discriminative Gaifman model [Niepert, 2016]. In addition, ?) used a RNN to learn vector representations of PRA-style relation paths between entities in the KG. Other random-walk based learning algorithms for KG completion can be also found in ?), ?), ?), ?) and ?).
?) proposed a Neural Logic Programming (LP) framework to learning probabilistic first-order logical rules for KG reasoning, producing competitive link prediction performances. ?) presented an approach to generate sentences from triples via hand-craft templates, and then use the likelihoods produced by the pre-trained BERT [Devlin et al., 2019] for these generated sentences to score the plausibility of the corresponding triples. See other methods for learning from KGs and multi-relational data in ?) and ?).
4 Evaluation Task
The standard evaluation task of entity prediction, i.e. the link prediction task [Bordes et al., 2013], is proposed to evaluate embedding models for KG completion.333Another evaluation task for KG completion is triple classification [Socher et al., 2013], however, it is not as widely used as the link prediction task. See the Appendix for a summary of state-of-the-art triple classification results.
Datasets:
Information about benchmark datasets for KG completion evaluation is given in Table 2. FB15k and WN18 are derived from the large real-world KG Freebase [Bollacker et al., 2008] and the large lexical KG WordNet [Miller, 1995], respectively. ?) noted that FB15k and WN18 are not challenging datasets because they contain many reversible triples. ?) showed a concrete example: A test triple () can be mapped to a training triple (), thus knowing that “” and “” are reversible allows us to easily predict the majority of test triples. So, datasets FB15k-237 [Toutanova and Chen, 2015] and WN18RR [Dettmers et al., 2018] are created to serve as realistic KG completion datasets which represent a more challenging learning setting. FB15k-237 and WN18RR are subsets of FB15k and WN18, respectively.
4.1 Task Description
The entity prediction task, i.e. link prediction [Bordes et al., 2013], predicts the head or the tail entity given the relation type and the other entity, i.e. predicting given or predicting given where denotes the missing element. The results are evaluated using a ranking induced by the function on test triples.
Each correct test triple is corrupted by replacing either its head or tail entity by each of the possible entities in turn, and then these candidates are ranked in descending order of their plausibility score. The “Filtered” setting protocol, described in ?), filters out before ranking any corrupted triples that appear in the KG. Ranking a corrupted triple appearing in the KG (i.e. a correct triple) higher than the original test triple is also correct, thus this “Filtered” setting provides a clear view on the ranking performance.
In addition to the mean rank and the Hits@10 (i.e. the proportion of test triples for which the target entity is ranked in the top 10 predictions), which were originally used in the entity prediction task [Bordes et al., 2013], recent work also reports the mean reciprocal rank (MRR).444See ?) for definitions of the mean rank, Hits@10 and MRR. Some recent work additionally reported Hits@1 (i.e. the proportion of test triples for which the target entity is ranked first). However, formulas of MRR and Hits@1 show a strong correlation between these two scores. So using Hits@1 might not reveal any additional insight. Mean rank is always greater or equal to 1 and the lower mean rank indicates better entity prediction performance, while MRR and Hits@10 scores always range from 0.0 to 1.0, and higher score reflects better prediction result.
4.2 Main Results
Tables 3 and 4 list recent entity prediction results of KG completion models on FB15k and WN18 and on FB15k-237 and WN18RR, respectively. In Table 3, the first 28 rows report the performance of triple-based models that directly optimize a score function for the triples in a KG, i.e. they do not exploit information about alternative paths between head and tail entities. The next 9 rows report results of models that exploit information about relation paths or neighborhood information. The last 2 rows present results for models which make use of textual mentions derived from a large external corpus. In Table 4, the last 5 rows report results of models that exploit the path or neighborhood information.
In general, Tables 3 and 4 show that the models using external corpus information or employing path information achieve better scores than the triple-based models that do not use such information. In terms of models not exploiting path or external information, the complex vector-based models (e.g. QuatE, CompleEx-N3 and RotatE) produce the strongest evaluation scores, followed by the neural network-based models (e.g. CapsE, InteractE and HypER).555CapsE uses the pre-trained word embeddings for entity vector initialization on WN18RR. It is not surprising that CapsE produces the best MR on WN18RR as many entity names in WordNet are lexically meaningful. It is possible for all other embedding models to utilize the pre-trained word vectors as well. However, averaging the pre-trained word embeddings for initializing entity vectors is an open problem, and it is not always useful since entity names in many domain-specific KGs are not lexically meaningful [Wang et al., 2014, Guu et al., 2015]. Tables 3 and 4 also show that TransE and DISTMULT, despite of theirs simplicity, can produce very competitive results (i.e. by performing a careful grid search of hyper-parameters).
5 Discussion and Conclusion
The reasons why much work has been devoted towards developing triple-based models are: (1) additional information sources might not be available, e.g., for KGs for specialized domains, (2) models that do not exploit path information or external resources are simpler and thus typically much faster to train than the more complex models using path or external information, and (3) the more complex models that exploit path or external information are typically extensions of these simpler models, and are often initialized with parameters estimated by such simpler models, so improvements to the simpler models should yield corresponding improvements to the more complex models as well [Nguyen et al., 2016a].
It is worth to further explore those KG completion embedding models for a new application where we could formulate its corresponding data into triples. For example, in Web search engines, we observe user-oriented relationships between submitted queries and documents returned by the search engines. That is, we have triple representations (query, user, document) in which for each user-oriented relationship, we would have many queries and documents, resulting in a lot of Many-to-Many relationships. Inspired by this observation, ?) applied STransE [Nguyen et al., 2016a] for search personalization to re-rank the search documents returned by a search engine for users’ submitted queries. Other application examples can be also found for recommender systems [Zhang et al., 2016, He et al., 2017, Cao et al., 2019], social relation extraction [Tu et al., 2017] and visual relation detection [Zhang et al., 2017].
Future research directions might also include: (i) Combining logical rules which contain rich background information and KG triples in a unified KG completion framework, e.g. jointly embedding KGs and logical rules [Guo et al., 2016, Yang et al., 2017]. (ii) Recent embedding models for KG completion hold a closed-world assumption where the KGs are fixed (i.e. new entities might not be added easily), therefore it would be worth exploring open-world KG completion models to connect unseen entities to the existing KGs [Shi and Weninger, 2018]. (iii) Investigating efficient approaches which can be applied to large-scale KGs of millions of entities and relations [Zhang et al., 2020].
In this paper, we have presented a comprehensive survey of embedding models of entity and relationships for knowledge graph completion. This paper also provides update-to-date experimental results of the embedding models for the entity prediction (i.e. link prediction) task on benchmark datasets FB15k, WN18, FB15k-237 and WN18RR. We hope that this paper serves its purpose by providing a concrete foundation for future research and applications on the topic.
Appendix
Triple Classification—Task Description
The triple classification task was first introduced by ?), and since then it has been used to evaluate various embedding models. The aim of this task is to predict whether a triple is correct or not. For classification, a relation-specific threshold is set for each relation type . If the plausibility score of an unseen test triple is higher than then the triple will be classified as correct, otherwise incorrect. Following ?), the relation-specific thresholds are determined by maximizing the micro-averaged accuracy, which is a per-triple average, on the validation set.
Triple Classification—Datasets
Information about benchmark datasets for the triple classification task is given in Table 5. FB13 and WN11 [Socher et al., 2013] are derived from the large real-world KG Freebase [Bollacker et al., 2008] and the large lexical KG WordNet [Miller, 1995], respectively. Note that when creating the FB13 and WN11 datasets, ?) already filtered out triples from the test set if either or both of their head and tail entities also appear in the training set in a different relation type or order.
Triple Classification—Main Results
Table 6 presents the triple classification results of KG completion models on the WN11 and FB13 datasets. The first 9 rows report the performance of models that use TransE/DISTMULT to initialize the entity and relation vectors. The last 12 rows present the accuracy of models with randomly initialized parameters. Note that there are higher triple classification results computed for NTN, Bilinear-comp and TransE-comp when entity vectors are initialized by averaging the pre-trained GloVe word vectors [Pennington et al., 2014]. It is not surprising because many entity names in WordNet and Freebase are lexically meaningful. However, this is not always the case w.r.t. many domain-specific KGs.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Agirre et al., 2013] Eneko Agirre, Oier López de Lacalle, and Aitor Soroa. 2013. Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics , 40(1):57–84.
- 2[Baeza-Yates and Ribeiro-Neto, 2011] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the concepts and technology behind search, Second edition . Pearson Education Ltd., Harlow, England.
- 3[Balažević et al., 2019] Ivana Balažević, Carl Allen, and Timothy M Hospedales. 2019. Hypernetwork knowledge graph embeddings. In ICANN , pages 553–565.
- 4[Balazevic et al., 2019] Ivana Balazevic, Carl Allen, and Timothy Hospedales. 2019. Tuck ER: Tensor factorization for knowledge graph completion. In EMNLP-IJCNLP , pages 5185–5194.
- 5[Berant et al., 2013] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. In EMNLP , pages 1533–1544.
- 6[Bollacker et al., 2008] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In SIGMOD , pages 1247–1250.
- 7[Bordes et al., 2011] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning Structured Embeddings of Knowledge Bases. In AAAI , pages 301–306.
- 8[Bordes et al., 2012] Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2012. A Semantic Matching Energy Function for Learning with Multi-relational Data. Machine Learning , 94(2):233–259.
