A survey of embedding models of entities and relationships for knowledge   graph completion

Dat Quoc Nguyen

arXiv:1703.08098·cs.CL·October 28, 2020

A survey of embedding models of entities and relationships for knowledge graph completion

Dat Quoc Nguyen

PDF

2 Repos

TL;DR

This survey reviews embedding models for knowledge graph completion, summarizing recent experimental results and highlighting future research directions to improve link prediction accuracy.

Contribution

It provides a comprehensive overview of current embedding models for entities and relationships in knowledge graphs, including experimental comparisons and future research insights.

Findings

01

Embedding models vary in effectiveness across datasets

02

Recent models achieve higher accuracy in link prediction

03

Future research should focus on model scalability and interpretability

Abstract

Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.

Tables6

Table 1. Table 1: The score functions f ( h , r , t ) 𝑓 ℎ 𝑟 𝑡 f(h,r,t) of several prominent embedding models for KG completion. In these models, the entities h ℎ h and t 𝑡 t are represented by vectors 𝒗 h subscript 𝒗 ℎ \boldsymbol{v}_{h} and 𝒗 t ∈ ℝ k subscript 𝒗 𝑡 superscript ℝ 𝑘 \boldsymbol{v}_{t}\in\mathbb{R}^{k} , respectively. ℓ 1 / 2 subscript ℓ 1 2 \ell_{1/2} denotes either the L 1 -norm or the squared L 2 -norm. In ConvE, 𝒗 ¯ h subscript ¯ 𝒗 ℎ \overline{\boldsymbol{v}}_{h} and 𝒗 ¯ r subscript ¯ 𝒗 𝑟 \overline{\boldsymbol{v}}_{r} denote a 2D reshaping of 𝒗 h subscript 𝒗 ℎ \boldsymbol{v}_{h} and 𝒗 r subscript 𝒗 𝑟 \boldsymbol{v}_{r} , respectively. In both ConvE and ConvKB models, ∗ ∗ \ast and Ω double-struck-Ω \mathbb{\Omega} denote a convolution operator and a set of filters, respectively.

Model		Score function $f (h, r, t)$
Translation	Unstructured	$- {‖ 𝒗_{h} - 𝒗_{t} ‖}_{ℓ_{1 / 2}}$
	SE	$- {‖ W_{r, 1} 𝒗_{h} - W_{r, 2} 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $W_{r, 1}$ , $W_{r, 2}$ $\in$ $ℝ^{k \times k}$
	TransE	$- {‖ 𝒗_{h} + 𝒗_{r} - 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $𝒗_{r} \in ℝ^{k}$
	TransH	$- {‖ (I - 𝒓_{p} 𝒓_{p}^{⊤}) 𝒗_{h} + 𝒗_{r} - (I - 𝒓_{p} 𝒓_{p}^{⊤}) 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $𝒓_{p}$ , $𝒗_{r} \in$ $ℝ^{k}$ , I denotes an identity matrix size $k \times k$
	TransR	$- {‖ W_{r} 𝒗_{h} + 𝒗_{r} - W_{r} 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $W_{r}$ $\in$ $ℝ^{n \times k}$ , $𝒗_{r}$ $\in$ $ℝ^{n}$
	STransE	$- {‖ W_{r, 1} 𝒗_{h} + 𝒗_{r} - W_{r, 2} 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $W_{r, 1}$ , $W_{r, 2}$ $\in$ $ℝ^{k \times k}$ , $𝒗_{r} \in ℝ^{k}$
	TranSparse	$- {‖ W_{r, 1} (θ_{r, 1}) 𝒗_{h} + 𝒗_{r} - W_{r, 2} (θ_{r, 2}) 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $W_{r, 1}$ , $W_{r, 2}$ $\in$ $ℝ^{n \times k}$ ; $θ_{r, 1}$ , $θ_{r, 2} \in ℝ$ ; $𝒗_{r}$ $\in$ $ℝ^{n}$
	TransD	$- {‖ (I + 𝒓_{p} 𝒉_{p}^{⊤}) 𝒗_{h} + 𝒗_{r} - (I + 𝒓_{p} 𝒕_{p}^{⊤}) 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $𝒓_{p}$ , $𝒗_{r}$ , $𝒉_{p}, 𝒕_{p}$ $\in$ $ℝ^{k}$
	lppTransD	$- {‖ (I + 𝒓_{p, 1} 𝒉_{p}^{⊤}) 𝒗_{h} + 𝒗_{r} - (I + 𝒓_{p, 2} 𝒕_{p}^{⊤}) 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $𝒓_{p, 1}$ , $𝒓_{p, 2}$ , $𝒗_{r}$ , $𝒉_{p}, 𝒕_{p}$ $\in$ $ℝ^{k}$
Bilinear & Tensor	Bilinear	$𝒗_{h}^{⊤} W_{r} 𝒗_{t}$ where $W_{r}$ $\in$ $ℝ^{k \times k}$
	DISTMULT	$𝒗_{h}^{⊤} W_{r} 𝒗_{t}$ where $W_{r}$ is a diagonal matrix $\in$ $ℝ^{k \times k}$
	SimplE	$\frac{1}{2}$ ( $𝒗_{h, 1}^{⊤} W_{r} 𝒗_{t, 2}$ + $𝒗_{t, 1}^{⊤} W_{r^{- 1}} 𝒗_{h, 2}$ ) where $𝒗_{h, 1}, 𝒗_{h, 2}, 𝒗_{t, 1}, 𝒗_{t, 2} \in ℝ^{k}$ ; $W_{r}$ and $W_{r^{- 1}}$ are diagonal matrices $\in$ $ℝ^{k \times k}$
	SME(bilinear)	$𝒗_{h}^{⊤} {(M_{1} \times_{3} 𝒗_{r})}^{⊤} (M_{2} \times_{3} 𝒗_{r}) 𝒗_{t}$ where $𝒗_{r} \in ℝ^{k}$ ; $M_{1}, M_{2} \in ℝ^{n \times k \times k}$
	TuckER	$M \times_{1} 𝒗_{h} \times_{2} 𝒗_{r} \times_{3} 𝒗_{t}$ where $𝒗_{r} \in ℝ^{n}$ , $M \in ℝ^{k \times n \times k}$ ; $\times_{d}$ denotes the tensor product along the $d$ -th mode
	HolE	$𝗌𝗂𝗀𝗆𝗈𝗂𝖽 (𝒗_{t}^{⊤} (𝒗_{h} ⋆ 𝒗_{r}))$ where $⋆ denotes circular correlation$
Neural network	NTN	$𝒗_{r}^{⊤} 𝗍𝖺𝗇𝗁 (𝒗_{h}^{⊤} M_{r} 𝒗_{t} + W_{r, 1} 𝒗_{h} + W_{r, 2} 𝒗_{t} + b_{r})$ where $𝒗_{r} {, b}_{r} \in ℝ^{n}$ ; $M_{r} \in ℝ^{k \times k \times n}$ ; $W_{r, 1}$ , $W_{r, 2} \in ℝ^{n \times k}$
	ER-MLP	$𝗌𝗂𝗀𝗆𝗈𝗂𝖽 (𝕨^{⊤} 𝗍𝖺𝗇𝗁 (𝑾 𝖼𝗈𝗇𝖼𝖺𝗍 (𝒗_{h}, 𝒗_{r}, 𝒗_{t})))$
	ConvE	$𝒗_{t}^{⊤} 𝖱𝖾𝖫𝖴 (𝑾 𝗏𝖾𝖼 (𝖱𝖾𝖫𝖴 (𝖼𝗈𝗇𝖼𝖺𝗍 ({\bar{𝒗}}_{h}, {\bar{𝒗}}_{r}) * Ω)))$
	ConvKB	$𝕨^{⊤} 𝖼𝗈𝗇𝖼𝖺𝗍 (𝖱𝖾𝖫𝖴 ([𝒗_{h}, 𝒗_{r}, 𝒗_{t}] * Ω))$
Complex vector	ComplEx	$𝖱𝖾 (𝒄_{h}^{⊤} C_{r} {\hat{𝒄}}_{t})$ where $𝖱𝖾 (c)$ denotes the real part of the complex value $c \in ℂ$
		$𝒄_{h}, 𝒄_{t} \in ℂ^{k}$ ; $C_{r} \in ℂ^{k \times k}$ is a diagonal matrix ; ${\hat{𝒄}}_{t}$ is the conjugate of $𝒄_{t}$
	RotatE	$- {‖ 𝒄_{h} \circ 𝒄_{r} - 𝒄_{t} ‖}_{ℓ_{1 / 2}}$ where $𝒄_{h}, 𝒄_{r}, 𝒄_{t} \in ℂ^{k}$ ; $\circ$ denotes the element-wise product
	QuatE	$𝒒_{h} \otimes \frac{𝒒_{r}}{\| 𝒒_{r} \|} ∙ 𝒒_{t}$ where $𝒒_{h}, 𝒒_{r}, 𝒒_{t} \in ℍ^{k}$ ; $\otimes$ and $∙$ denote Hamilton and quaternion inner products, respectively
Path	TransE-comp	$- {‖ 𝒗_{h} + 𝒗_{r_{1}} + 𝒗_{r_{2}} + \dots + 𝒗_{r_{m}} - 𝒗_{t} ‖}_{ℓ_{1 / 2}}$ where $𝒗_{r_{1}}, 𝒗_{r_{2}}, \dots, 𝒗_{r_{m}} \in ℝ^{k}$
Path	Bilinear-comp	$𝒗_{h}^{⊤} W_{r_{1}} W_{r_{2}} \dots W_{r_{m}} 𝒗_{t}$ where $W_{r_{1}}, W_{r_{2}}, \dots, W_{r_{m}} \in ℝ^{k \times k}$

Table 2. Table 2: Statistics of benchmark experimental datasets.

Dataset	$∣ ℰ ∣$	$∣ ℛ ∣$	#Triples in train/valid/test
FB15k [Bordes et al., 2013]	14,951	1,345	483,142	50,000	59,071
WN18 [Bordes et al., 2013]	40,943	18	141,442	5,000	5,000
FB15k-237 [Toutanova and Chen, 2015]	14,541	237	272,115	17,535	20,466
WN18RR ?)	40,943	11	86,835	3,034	3,134

Table 3. Table 3: Entity prediction results on WN18 and FB15k, which are taken from the corresponding papers. MR and @10 denote metrics mean rank and Hits@10 (in %), respectively. [ ◆ ◆ \blacklozenge ], [ ■ ■ \blacksquare ] , [ ♠ ♠ \spadesuit ] and [ ♣ ♣ \clubsuit ] denote results taking from Yang et al. (2015), Nickel et al. (2016b), Ravishankar et al. (2017) and Kadlec et al. (2017), respectively.

Method	Filtered
	FB15k			WN18
	MR	@10	MRR	MR	@10	MRR
TransH [Wang et al., 2014]	87	64.4	-	303	86.7	-
TransR [Lin et al., 2015b]	77	68.7	-	225	92.0	-
CTransR [Lin et al., 2015b]	75	70.2	-	218	92.3	-
KG2E [He et al., 2015]	59	74.0	-	331	92.8	-
TransD [Ji et al., 2015]	91	77.3	-	212	92.2	-
lppTransD [Yoon et al., 2016]	78	78.7	-	270	94.3	-
TransG [Xiao et al., 2016]	98	79.8	-	470	93.3	-
TranSparse [Ji et al., 2016]	82	79.5	-	211	93.2	-
TranSparse-DT [Chang et al., 2017]	79	80.2	-	221	94.3	-
ITransF [Xie et al., 2017]	65	81.0	-	205	94.2	-
NTN [Socher et al., 2013] [ $◆$ ]	-	41.4	0.25	-	66.1	0.53
TransE [Bordes et al., 2013] [ $■$ ]	-	74.9	0.463	-	94.3	0.495
HolE [Nickel et al., 2016b]	-	73.9	0.524	-	94.9	0.938
ComplEx [Trouillon et al., 2016]	-	84.0	0.692	-	94.7	0.941
ANALOGY [Liu et al., 2017]	-	85.4	0.725	-	94.7	0.942
SimplE [Kazemi and Poole, 2018]	-	83.8	0.727	-	94.7	0.942
TorusE [Ebisu and Ichise, 2018]	-	83.2	0.733	-	95.4	0.947
STransE [Nguyen et al., 2016a]	69	79.7	0.543	206	93.4	0.657
ER-MLP [Dong et al., 2014] [ $♠$ ]	81	80.1	0.570	299	94.2	0.895
DISTMULT [Yang et al., 2015] [ $♣$ ]	42	89.3	0.798	655	94.6	0.797
ConvE [Dettmers et al., 2018]	64	87.3	0.745	504	95.5	0.942
HypER [Balažević et al., 2019]	44	88.5	0.790	431	95.8	0.951
RotatE [Sun et al., 2019]	40	88.4	0.797	309	95.9	0.949
QuatE [Zhang et al., 2019]	17	90.0	0.782	162	95.9	0.950
ComplEx-N3 [Lacroix et al., 2018]	-	91	0.86	-	96	0.95
TuckER [Balazevic et al., 2019]	-	89.2	0.795	-	95.8	0.953
IRN [Shen et al., 2017]	38	92.7	-	249	95.3	-
ProjE [Shi and Weninger, 2017]	34	88.4	-	-	-	-
rTransE [García-Durán et al., 2015]	50	76.2	-	-	-	-
PTransE-ADD [Lin et al., 2015a]	58	84.6	-	-	-	-
PTransE-RNN [Lin et al., 2015a]	92	82.2	-	-	-	-
GAKE [Feng et al., 2016b]	119	64.8	-	-	-	-
Gaifman [Niepert, 2016]	75	84.2	-	352	93.9	-
Hiri [Liu et al., 2016]	-	70.3	0.603	-	90.8	0.691
Neural LP [Yang et al., 2017]	-	83.7	0.76	-	94.5	0.94
R-GCN+ [Schlichtkrull et al., 2018]	-	84.2	0.696	-	96.4	0.819
KB_LRN [Durán and Niepert, 2018]	44	87.5	0.794	-	-	-
TEKE_H [Wang and Li, 2016]	108	73.0	-	114	92.9	-
SSP [Xiao et al., 2017]	82	79.0	-	156	93.2	-

Table 4. Table 4: Entity prediction results on WN18RR and FB15k-237, which are taken from the corresponding papers. [ ◆ ◆ \blacklozenge ] , [ ♠ ♠ \spadesuit ] and [ ■ ■ \blacksquare ] denote results taking from Dettmers et al. (2018), Ravishankar et al. (2017) and Nguyen et al. (2019), respectively.

Method	Filtered
	FB15k-237			WN18RR
	MR	@10	MRR	MR	@10	MRR
IRN [Shen et al., 2017]	211	46.4	-	-	-	-
KBGAN [Cai and Wang, 2018]	-	45.8	0.278	-	48.1	0.213
DISTMULT [Yang et al., 2015] [ $◆$ ]	254	41.9	0.241	5110	49	0.43
ComplEx [Trouillon et al., 2016] [ $◆$ ]	339	42.8	0.247	5261	51	0.44
ConvE [Dettmers et al., 2018]	246	49.1	0.316	5277	48	0.46
ER-MLP [Dong et al., 2014] [ $♠$ ]	219	54.0	0.342	4798	41.9	0.366
HypER [Balažević et al., 2019]	250	52.0	0.341	5798	52.2	0.465
TransE [Bordes et al., 2013] [ $■$ ]	347	46.5	0.294	743	56.0	0.245
ConvKB [Nguyen et al., 2018] [ $■$ ]	254	53.2	0.418	763	56.7	0.253
CapsE [Nguyen et al., 2019b]	303	59.3	0.523	719	56.0	0.415
InteractE [Vashishth et al., 2020]	172	53.5	0.354	5202	52.8	0.463
RotatE [Sun et al., 2019]	177	53.3	0.338	3340	57.1	0.476
QuatE [Zhang et al., 2019]	87	55.0	0.348	2314	58.2	0.488
ComplEx-N3 [Lacroix et al., 2018]	-	56	0.37	-	57	0.48
Conv-TransE [Shang et al., 2019]	-	51	0.33	-	52	0.46
TuckER [Balazevic et al., 2019]	-	54.4	0.358	-	52.6	0.470
Neural LP [Yang et al., 2017]	-	36.2	0.24	-	-	-
R-GCN+ [Schlichtkrull et al., 2018]	-	41.7	0.249	-	-	-
KB_LRN [Durán and Niepert, 2018]	209	49.3	0.309	-	-	-
KBGAT [Nathani et al., 2019]	210	62.6	0.518	1940	58.1	0.440
ReInceptionE [Xie et al., 2020]	173	52.8	0.349	1894	58.2	0.483
SACN [Shang et al., 2019]	-	54	0.35	-	54	0.47

Table 5. Table 5: Statistics of the benchmark datasets for triple classification. In both WN11 and FB13, each validation and test set also contains the same number of incorrect triples as the number of correct triples.

Dataset	$∣ ℰ ∣$	$∣ ℛ ∣$	#Triples in train/valid/test
FB13 [Socher et al., 2013]	75,043	13	316,232	5,908	23,733
WN11 [Socher et al., 2013]	38,696	11	112,581	2,609	10,544

Table 6. Table 6: Accuracy results (in %) for triple classification on WN11 (labeled as W11 ) and FB13 (labeled as F13 ) test sets, which are taken from the corresponding papers. “Avg.” denotes the averaged accuracy. [*] denotes that scores are taken from ? ).

Method	W11	F13	Avg.
CTransR [Lin et al., 2015b]	85.7	-	-
TransR [Lin et al., 2015b]	85.9	82.5	84.2
TransD [Ji et al., 2015]	86.4	89.1	87.8
TEKE_H [Wang and Li, 2016]	84.8	84.2	84.5
TranSparse-S [Ji et al., 2016]	86.4	88.2	87.3
TranSparse-US [Ji et al., 2016]	86.8	87.5	87.2
ConvKB [Nguyen et al., 2018] [*]	87.6	88.8	88.2
TransE-HRS [Zhang et al., 2018]	86.8	88.4	87.6
DISTMULT-HRS [Zhang et al., 2018]	88.9	89.0	89.0
NTN [Socher et al., 2013]	70.6	87.2	78.9
TransH [Wang et al., 2014]	78.8	83.3	81.1
SLogAn [Liang and Forbus, 2015]	75.3	85.3	80.3
KG2E [He et al., 2015]	85.4	85.3	85.4
Bilinear-comp [Guu et al., 2015]	77.6	86.1	81.9
TransE-comp [Guu et al., 2015]	80.3	87.6	84.0
TransR-FT [Feng et al., 2016a]	86.6	82.9	84.8
TransG [Xiao et al., 2016]	87.4	87.3	87.4
lppTransD [Yoon et al., 2016]	86.2	88.6	87.4
TransE [Bordes et al., 2013] [*]	86.5	87.5	87.0
TransE-NMM [Nguyen et al., 2016b]	86.8	88.6	87.7
TranSparse-DT [Chang et al., 2017]	87.1	87.9	87.5

Equations12

v_{J a p an} - v_{T o k y o}

v_{J a p an} - v_{T o k y o}

v_{G er man y} - v_{B er l in}

v_{T o k y o} + v_{i s_c a p i t a l_o f} - v_{J a p an}

v_{T o k y o} + v_{i s_c a p i t a l_o f} - v_{J a p an}

v_{B er l in} + v_{i s_c a p i t a l_o f} - v_{G er man y}

v_{L i s b o n} + v_{i s_c a p i t a l_o f} - v_{P or t ug a l}

L_{Margin} = (h, r, t) \in G (h^{'}, r, t^{'}) \in G_{(h, r, t)}^{'} \sum [γ - f (h, r, t) + f (h^{'}, r, t^{'})]_{+}

L_{Margin} = (h, r, t) \in G (h^{'}, r, t^{'}) \in G_{(h, r, t)}^{'} \sum [γ - f (h, r, t) + f (h^{'}, r, t^{'})]_{+}

L_{Softmax}

L_{Softmax}

L_{Logistic}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A survey of embedding models of entities and relationships

for knowledge graph completion

Dat Quoc Nguyen

VinAI Research, Vietnam

[email protected]

Abstract

Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.

Keywords: Knowledge graph completion, Link prediction, Embedding model, Entity prediction.

1 Introduction

Let us revisit the classic Word2Vec example of a “royal” relationship between “ $\mathsf{king}$ ” and “ $\mathsf{man}$ ”, and between “ $\mathsf{queen}$ ” and “ $\mathsf{woman}$ ”. As illustrated in this example: $\boldsymbol{v}_{king}-\boldsymbol{v}_{man}\approx\boldsymbol{v}_{queen}-\boldsymbol{v}_{woman}$ , word vectors learned from a large corpus can model relational similarities or linguistic regularities between pairs of words as translations in the projected vector space [Mikolov et al., 2013, Pennington et al., 2014]. Figure 2 shows another example of a relational similarity between word pairs of countries and capital cities:

[TABLE]

Assume that we consider the country and capital pairs in Figure 2 to be pairs of entities rather than word types. That is, we now represent country and capital entities by low-dimensional and dense vectors. The relational similarity between word pairs is presumably to capture a “ $\mathsf{is\_capital\_of}$ ” relationship between country and capital entities. Also, we represent this relationship by a translation vector $\boldsymbol{v}_{{is\_capital\_of}}$ in the entity vector space. Thus, we expect:

[TABLE]

This intuition inspired the TransE model—a well-known embedding model for KG completion or link prediction in KGs [Bordes et al., 2013].

Knowledge graphs are collections of real-world triples, where each triple or fact $(h,r,t)$ in KGs represents some relation $r$ between a head entity $h$ and a tail entity $t$ . KGs can thus be formalized as directed multi-relational graphs, where nodes correspond to entities and edges linking the nodes encode various kinds of relationships [García-Durán et al., 2016, Nickel et al., 2016a]. Here entities are real-world things or objects such as persons, places, organizations, music tracks or movies. Each relation type defines a certain relationship between entities. For example, as illustrated in Figure 2, the relation type “ $\mathsf{child\_of}$ ” relates person entities with each other, while the relation type “ $\mathsf{born\_in}$ ” relates person entities with place entities. Several KG examples include the domain-specific KG GeneOntology and popular generic KGs of WordNet [Fellbaum, 1998], YAGO [Suchanek et al., 2007], Freebase [Bollacker et al., 2008], NELL [Carlson et al., 2010] and DBpedia [Lehmann et al., 2015] as well as commercial KGs such as Google’s Knowledge Graph, Microsoft’s Satori and Facebook’s Open Graph. Nowadays, KGs are used in a number of commercial applications including search engines such as Google, Microsoft’s Bing and Facebook’s Graph search. They also are useful resources for many natural language processing tasks such as question answering [Ferrucci, 2012, Fader et al., 2014], word sense disambiguation [Navigli and Velardi, 2005, Agirre et al., 2013], semantic parsing [Krishnamurthy and Mitchell, 2012, Berant et al., 2013] and co-reference resolution [Ponzetto and Strube, 2006, Dutta and Weikum, 2015].

A main issue is that even very large KGs, such as Freebase and DBpedia, which contain billions of fact triples about the world, are still far from complete. In particular, in English DBpedia 2014, 60% of person entities miss a place of birth and 58% of the scientists do not have a fact about what they are known for [Krompaß et al., 2015]. In Freebase, 71% of 3 million person entities miss a place of birth, 75% do not have a nationality while 94% have no facts about their parents [West et al., 2014]. So, in terms of a specific application, question answering systems based on incomplete KGs would not provide a correct answer given a correctly interpreted question. For example, given the incomplete KG in Figure 2, it would be impossible to answer the question “where was Jane born ?”, although the question is completely matched with existing entity and relation type information (i.e. “ $\mathsf{Jane}$ ” and “ $\mathsf{born\_in}$ ”) in KG. Consequently, much work has been devoted towards knowledge graph completion to perform link prediction in KGs, which attempts to predict whether a relationship/triple not in the KG is likely to be true, i.e. to add new triples by leveraging existing triples in the KG [Lao and Cohen, 2010, Bordes et al., 2012, Gardner et al., 2014, García-Durán et al., 2016]. For example, we would like to predict the missing tail entity in the incomplete triple $\mathsf{(Jane,born\_in,?)}$ or predict whether the triple $\mathsf{(Jane,born\_in,Miami)}$ is correct or not.

Embedding models for KG completion have been proven to give state-of-the-art link prediction performances, in which entities are represented by latent feature vectors while relation types are represented by latent feature vectors and/or matrices and/or third-order tensors [Bordes et al., 2013, Socher et al., 2013]. This paper: (1) surveys the embedding models for KG completion, then (2) summarizes up-to-date experimental results on the standard evaluation task of entity prediction—which is also referred to as the link prediction task [Bordes et al., 2013], and (3) points out potential future research directions.

2 A General Approach of Embedding Models for KG Completion

Let $\mathcal{E}$ denote the set of entities and $\mathcal{R}$ the set of relation types. Denote by $\mathcal{G}$ the knowledge graph consisting of a set of correct triples $(h,r,t)$ , such that $h,t\in\mathcal{E}$ and $r\in\mathcal{R}$ . For each triple $(h,r,t)$ , the embedding models define a score function $f(h,r,t)$ of its plausibility. Their goal here is to:

Choose $f$ such that the score $f(h,r,t)$ of a correct triple $(h,r,t)$ is higher than the score $f(h^{\prime},r^{\prime},t^{\prime})$ of an incorrect triple $(h^{\prime},r^{\prime},t^{\prime})$ .

For example, TransE defines a score function of $f_{\text{TransE}}(h,r,t)=-\|\boldsymbol{v}_{h}+\boldsymbol{v}_{r}-\boldsymbol{v}_{t}\|$ , where $h$ , $r$ and $t$ are represented by low dimensional vectors $\boldsymbol{v}_{h}$ , $\boldsymbol{v}_{r}$ and $\boldsymbol{v}_{t}$ , respectively. As $\mathsf{(Tokyo,is\_capital\_of,Japan)}$ is a correct triple, while $\mathsf{(Tokyo,is\_capital\_of,Portugal)}$ and $\mathsf{(Lisbon,is\_capital\_of,Japan)}$ are incorrect ones, we would have: $-\|\boldsymbol{v}_{Tokyo}+\boldsymbol{v}_{{is\_capital\_of}}-\boldsymbol{v}_{Japan}\|>-\|\boldsymbol{v}_{Tokyo}+\boldsymbol{v}_{{is\_capital\_of}}-\boldsymbol{v}_{Portugal}\|$ , and $-\|\boldsymbol{v}_{Tokyo}+\boldsymbol{v}_{{is\_capital\_of}}-\boldsymbol{v}_{Japan}\|>-\|\boldsymbol{v}_{Lisbon}+\boldsymbol{v}_{{is\_capital\_of}}-\boldsymbol{v}_{Japan}\|$ . Table 1 in Section 3 summarizes different prominent score functions $f(h,r,t)$ .

To learn model parameters (i.e. entity vectors, relation vectors or matrices), the embedding models minimize an objective loss $\mathcal{L}$ . A conventional objective loss is the margin-based pairwise ranking loss [Bordes et al., 2013]:

[TABLE]

where $[x]_{+}=\max(0,x)$ ; $\gamma$ is the margin hyper-parameter; and $\mathcal{G}^{\prime}_{(h,r,t)}$ is the set of incorrect triples generated by corrupting the correct triple $(h,r,t)\in\mathcal{G}$ .

Also, the negative log-likelihood (NLL) of softmax regression [Toutanova and Chen, 2015] and the NLL of logistic regression [Trouillon et al., 2016] are commonly used in recent KG completion research:111All the losses can also include an L2 regularization on the model parameters, which is not shown for simplification.

[TABLE]

To corrupt the head or tail entities, a common strategy is to uniformly replace the entities when sampling incorrect triples [Bordes et al., 2013], however it results in many false negative labels [Wang et al., 2014]. Domain sampling [Krompaß et al., 2015, Xie et al., 2017] generates corrupted triples by sampling entities from the same domain or from the set of relation-dependent entities. The “Bernoulli” trick [Wang et al., 2014] is widely used to set different probabilities for generating head or tail entities: For each relation type $r$ , we calculate the averaged number $a_{r,1}$ of heads $h$ for a pair $(r,t)$ and the averaged number $a_{r,2}$ of tails $t$ for a pair $(h,r)$ . We then define a Bernoulli distribution with success probability $\lambda_{r}=\dfrac{a_{r,1}}{a_{r,1}+a_{r,2}}$ for sampling: given a correct triple $(h,r,t)$ , we corrupt this triple by replacing head entity with probability $\lambda_{r}$ while replacing the tail entity with probability $(1-\lambda_{r})$ .

Recently, ?) and ?) proposed adversarial learning-based strategies for sampling incorrect triples. However, they did not provide a comparison between the adversarial learning-based strategies and the “Bernoulli” trick.

3 Specific Models

3.1 Triple-based Embedding Models

Translation-based models:

The Unstructured model [Bordes et al., 2012] assumes that the head and tail entity vectors are similar. As the Unstructured model does not take the relationship into account, it cannot distinguish different relation types. The Structured Embedding (SE) model [Bordes et al., 2011] assumes that the head and tail entities are similar only in a relation-dependent subspace, where each relation is represented by two different matrices. TransE [Bordes et al., 2013] is inspired by models such as the Word2Vec Skip-gram model [Mikolov et al., 2013] where relationships between words often correspond to translations in latent feature space. In particular, TransE learns low-dimensional and dense vectors for every entity and relation type, so that each relation type corresponds to a translation vector operating on the vectors representing the entities, i.e. $\boldsymbol{v}_{h}+\boldsymbol{v}_{r}\approx\boldsymbol{v}_{t}$ for each fact triple $(h,r,t)$ . TransE thus is suitable for 1-to-1 relationships, such as “ $\mathsf{is\_capital\_of}$ ”, where a head entity is linked to at most one tail entity given a relation type. Because of using only one translation vector to represent each relation type, TransE is not well-suited for Many-to-1, 1-to-Many and Many-to-Many relationships,222A relation type $r$ is classified Many-to-1 if multiple head entities can be connected by $r$ to at most one tail entity. A relation type $r$ is classified 1-to-Many if multiple tail entities can be linked by $r$ from at most one head entity. A relation type $r$ is classified Many-to-Many if multiple head entities can be connected by $r$ to a tail entity and vice versa. such as for relation types “ $\mathsf{born\_in}$ ”, “ $\mathsf{place\_of\_birth}$ ” and “ $\mathsf{research\_fields}$ .” For example in Figure 2, using one vector representing the relation type “ $\mathsf{born\_in}$ ” cannot capture both the translating direction from “ $\mathsf{Patti}$ ” to “ $\mathsf{Miami}$ ” and its inverse direction from “ $\mathsf{Mom}$ ” to “ $\mathsf{Austin}$ .”

To overcome those issues of TransE, TransH [Wang et al., 2014] associates each relation with a relation-specific hyperplane and uses a projection vector to project entity vectors onto that hyperplane. TransD [Ji et al., 2015] and TransR/CTransR [Lin et al., 2015b] extend TransH by using two projection vectors and a matrix to project entity vectors into a relation-specific space, respectively. Similar to TransR, TransR-FT [Feng et al., 2016a] also uses a matrix to project head and tail entity vectors. TEKE_H [Wang and Li, 2016] extends TransH to incorporate rich context information in an external text corpus. lppTransD [Yoon et al., 2016] extends TransD to additionally use two projection vectors for representing each relation. STransE [Nguyen et al., 2016a] and TranSparse [Ji et al., 2016] can be viewed as direct extensions of TransR, where head and tail entities are associated with their own projection matrices. Unlike STransE, TranSparse uses adaptive sparse matrices, whose sparse degrees are defined based on the number of entities linked by relations. TranSparse-DT [Chang et al., 2017] is an extension of TranSparse with a dynamic translation. ITransF [Xie et al., 2017] can be considered as a generalization of STransE, which allows the sharing of statistic regularities between relation projection matrices and alleviates data sparsity issue. Furthermore, TorusE [Ebisu and Ichise, 2018] embeds entities and relations on a torus to handle TransE’s regularization problem which forces entity embeddings to be on a sphere in the embedding vector space.

Bilinear- & Tensor-based models:

DISTMULT [Yang et al., 2015] is based on the Bilinear model [Nickel et al., 2011, Jenatton et al., 2012] where each relation is represented by a diagonal matrix rather than a full matrix. SimplE [Kazemi and Poole, 2018] extends DISTMULT to allow two embeddings of each entity to be learned dependently. Such quadratic forms are also used to model entities and relations in KG2E [He et al., 2015], TATEC [García-Durán et al., 2016], TransG [Xiao et al., 2016], RSTE [Tay et al., 2017], ANALOGY [Liu et al., 2017] and Dihedral [Xu and Li, 2019]. SME-bilinear [Bordes et al., 2012] is proposed to first separately combine entity-relation pairs $(h,r)$ and $(r,t)$ and then semantically match these combinations, using tensor product. HolE [Nickel et al., 2016b] uses circular correlation–a compositional operator–which can be interpreted as a compression of the tensor product. In addition, TuckER [Balazevic et al., 2019] is a linear model based on the Tucker tensor decomposition of the binary tensor representation of KG triples.

Neural network-based models:

The neural tensor network (NTN) model [Socher et al., 2013] also uses a bilinear tensor operator to represent each relation while ProjE [Shi and Weninger, 2017] can be viewed as simplified versions of NTN. The ER-MLP model [Dong et al., 2014] represents each triple by a vector obtained from concatenating head, relation and tail embeddings, then feeds this vector into a single-layer MLP with one-node output layer. ConvE [Dettmers et al., 2018] and ConvKB [Nguyen et al., 2018] are based on convolutional neural networks. ConvE uses a convolution layer directly over 2D reshaping of head-entity and relation embeddings, while ConvKB applies a convolution layer over the embedding triples (here each triple $(h,r,t)$ is represented as a 3-column matrix where each column vector represents a triple element). HypER [Balažević et al., 2019] simplifies ConvE by using a hypernetwork to produce 1D convolutional filters for each relation, then extracts relation-specific features from head entity embeddings. Conv-TransE [Shang et al., 2019] extends ConvE to keep the translational characteristic between entities and relations. InteractE [Vashishth et al., 2020] uses a circular convolution operator and a checkered reshaping function instead of the standard convolution operator and 2D stack reshaping function in ConvE. The CapsE model [Nguyen et al., 2019b] extends ConvKB by stacking a capsule network layer [Sabour et al., 2017] on top of the convolution layer.

Complex vector-based models:

Instead of embedding entities and relations in the real-valued vector space, ComplEx [Trouillon et al., 2016] is an extension of DISTMULT in the complex vector space. ComplEx-N3 [Lacroix et al., 2018] extends ComplEx with weighted nuclear 3-norm. Also in the complex vector space, RotatE [Sun et al., 2019] defines each relation as a rotation from the head entity to the tail entity. QuatE [Zhang et al., 2019] represents entities by quaternion embeddings (i.e. hypercomplex-valued embeddings) and models relations as rotations in the quaternion space by employing the Hamilton and quaternion-inner products.

3.2 Relation Path-based Embedding Models

All embedding models mentioned above in Section 3.1 only take triples into account. Thus, these models ignore potentially useful information implicitly presented by the structure of the KG. For example, the relation path $h\xrightarrow{\mathsf{born\_in\_city}}e\xrightarrow{\mathsf{city\_in\_country}}t$ should indicate a relationship “ $\mathsf{nationality}$ ” between the $h$ and $t$ entities. Also, neighborhood information of entities could be useful for predicting the relationship between two entities as well. For example, in the KG NELL [Carlson et al., 2010], we have information such as if a person works for an organization and this person also leads that organization, then it is likely that this person is the CEO of that organization.

Recent research has also shown that relation paths between entities in KGs provide richer context information and improve the performance of embedding models for KG completion [Luo et al., 2015, Liang and Forbus, 2015, García-Durán et al., 2015, Guu et al., 2015, Toutanova et al., 2016, Durán and Niepert, 2018, Takahashi et al., 2018, Chen et al., 2018]. In particular, ?) constructed relation paths between entities and, viewing entities and relations in the path as pseudo-words, then applied Word2Vec [Mikolov et al., 2013] to produce pre-trained vectors for these pseudo-words. ?) showed that using these pre-trained vectors for initialization helps to improve the performance of models TransE [Bordes et al., 2013], SME [Bordes et al., 2012] and SE [Bordes et al., 2011]. ?) used the plausibility score produced by SME to compute the weights of relation paths.

PTransE-RNN [Lin et al., 2015a] models relation paths by using a recurrent neural network (RNN). In addition, ?)’s model and ROPs [Yin et al., 2018] also apply RNN to model the path between an entity pair, however, in contrast to PTransE-RNN, they additionally take the intermediate entities present in the path into account. IRN [Shen et al., 2017] uses a shared memory and RNN-based controller to implicitly model multi-step structured relationships. rTransE [García-Durán et al., 2015], PTransE-ADD [Lin et al., 2015a] and TransE-comp [Guu et al., 2015] extend TransE to represent a relation path by a vector which is the sum of the vectors of all relations in the path. In Bilinear-comp [Guu et al., 2015] and pruned-paths [Toutanova et al., 2016], each relation is a matrix and so it represents the relation path by matrix multiplication. ?) proposed the KBLRN framework to combine relational paths with latent and numerical features.

The neighborhood mixture model TransE-NMM [Nguyen et al., 2016b] can be also viewed as a three-relation path model because it takes into account the neighborhood entity and relation information of both head and tail entities in each triple. ReInceptionE [Xie et al., 2020] employs the Inception network [Szegedy et al., 2016] to increase the interactions between head and relation embeddings for obtaining better representations of the head and relation pairs and then uses a relation-aware attention mechanism to enrich these pair representations with the local neighborhood and global entity information. Neighborhood information is also exploited in R-GCN [Schlichtkrull et al., 2018], SACN [Shang et al., 2019] and KBGAT [Nathani et al., 2019], which generalize graph convolutional networks [Kipf and Welling, 2017] and graph attention networks [Veličković et al., 2018] for dealing with highly multi-relational data, e.g. KGs. For computing the final representation of an entity, they make use of layer-wise propagation to accumulate linearly-transformed embeddings of its neighboring entities through a normalized sum with different relational weights. For link prediction, R-GCN, SACN and KBGAT apply DISTMULT, Conv-TransE and ConvKB to compute triple scores, respectively.

3.3 Other KG Completion Models

The Path Ranking Algorithm (PRA) [Lao and Cohen, 2010] is a random walk inference technique which was proposed to predict a new relationship between two entities in KGs. ?) used PRA to estimate the probability of an unseen triple as a combination of weighted random walks that follow different paths linking the head entity and tail entity in the KG. ?) made use of an external text corpus to increase the connectivity of the KG used as the input to PRA. ?) improved PRA by proposing a subgraph feature extraction technique to make the generation of random walks in KGs more efficient and expressive, while ?) extended PRA to couple the path ranking of multiple relations. PRA can also be used in conjunction with first-order logic in the discriminative Gaifman model [Niepert, 2016]. In addition, ?) used a RNN to learn vector representations of PRA-style relation paths between entities in the KG. Other random-walk based learning algorithms for KG completion can be also found in ?), ?), ?), ?) and ?).

?) proposed a Neural Logic Programming (LP) framework to learning probabilistic first-order logical rules for KG reasoning, producing competitive link prediction performances. ?) presented an approach to generate sentences from triples via hand-craft templates, and then use the likelihoods produced by the pre-trained BERT [Devlin et al., 2019] for these generated sentences to score the plausibility of the corresponding triples. See other methods for learning from KGs and multi-relational data in ?) and ?).

4 Evaluation Task

The standard evaluation task of entity prediction, i.e. the link prediction task [Bordes et al., 2013], is proposed to evaluate embedding models for KG completion.333Another evaluation task for KG completion is triple classification [Socher et al., 2013], however, it is not as widely used as the link prediction task. See the Appendix for a summary of state-of-the-art triple classification results.

Datasets:

Information about benchmark datasets for KG completion evaluation is given in Table 2. FB15k and WN18 are derived from the large real-world KG Freebase [Bollacker et al., 2008] and the large lexical KG WordNet [Miller, 1995], respectively. ?) noted that FB15k and WN18 are not challenging datasets because they contain many reversible triples. ?) showed a concrete example: A test triple ( $\mathsf{feline,hyponym,cat}$ ) can be mapped to a training triple ( $\mathsf{cat,hypernym,feline}$ ), thus knowing that “ $\mathsf{hyponym}$ ” and “ $\mathsf{hypernym}$ ” are reversible allows us to easily predict the majority of test triples. So, datasets FB15k-237 [Toutanova and Chen, 2015] and WN18RR [Dettmers et al., 2018] are created to serve as realistic KG completion datasets which represent a more challenging learning setting. FB15k-237 and WN18RR are subsets of FB15k and WN18, respectively.

4.1 Task Description

The entity prediction task, i.e. link prediction [Bordes et al., 2013], predicts the head or the tail entity given the relation type and the other entity, i.e. predicting $h$ given $(?,r,t)$ or predicting $t$ given $(h,r,?)$ where $?$ denotes the missing element. The results are evaluated using a ranking induced by the function $f(h,r,t)$ on test triples.

Each correct test triple $(h,r,t)$ is corrupted by replacing either its head or tail entity by each of the possible entities in turn, and then these candidates are ranked in descending order of their plausibility score. The “Filtered” setting protocol, described in ?), filters out before ranking any corrupted triples that appear in the KG. Ranking a corrupted triple appearing in the KG (i.e. a correct triple) higher than the original test triple is also correct, thus this “Filtered” setting provides a clear view on the ranking performance.

In addition to the mean rank and the Hits@10 (i.e. the proportion of test triples for which the target entity is ranked in the top 10 predictions), which were originally used in the entity prediction task [Bordes et al., 2013], recent work also reports the mean reciprocal rank (MRR).444See ?) for definitions of the mean rank, Hits@10 and MRR. Some recent work additionally reported Hits@1 (i.e. the proportion of test triples for which the target entity is ranked first). However, formulas of MRR and Hits@1 show a strong correlation between these two scores. So using Hits@1 might not reveal any additional insight. Mean rank is always greater or equal to 1 and the lower mean rank indicates better entity prediction performance, while MRR and Hits@10 scores always range from 0.0 to 1.0, and higher score reflects better prediction result.

4.2 Main Results

Tables 3 and 4 list recent entity prediction results of KG completion models on FB15k and WN18 and on FB15k-237 and WN18RR, respectively. In Table 3, the first 28 rows report the performance of triple-based models that directly optimize a score function for the triples in a KG, i.e. they do not exploit information about alternative paths between head and tail entities. The next 9 rows report results of models that exploit information about relation paths or neighborhood information. The last 2 rows present results for models which make use of textual mentions derived from a large external corpus. In Table 4, the last 5 rows report results of models that exploit the path or neighborhood information.

In general, Tables 3 and 4 show that the models using external corpus information or employing path information achieve better scores than the triple-based models that do not use such information. In terms of models not exploiting path or external information, the complex vector-based models (e.g. QuatE, CompleEx-N3 and RotatE) produce the strongest evaluation scores, followed by the neural network-based models (e.g. CapsE, InteractE and HypER).555CapsE uses the pre-trained word embeddings for entity vector initialization on WN18RR. It is not surprising that CapsE produces the best MR on WN18RR as many entity names in WordNet are lexically meaningful. It is possible for all other embedding models to utilize the pre-trained word vectors as well. However, averaging the pre-trained word embeddings for initializing entity vectors is an open problem, and it is not always useful since entity names in many domain-specific KGs are not lexically meaningful [Wang et al., 2014, Guu et al., 2015]. Tables 3 and 4 also show that TransE and DISTMULT, despite of theirs simplicity, can produce very competitive results (i.e. by performing a careful grid search of hyper-parameters).

5 Discussion and Conclusion

The reasons why much work has been devoted towards developing triple-based models are: (1) additional information sources might not be available, e.g., for KGs for specialized domains, (2) models that do not exploit path information or external resources are simpler and thus typically much faster to train than the more complex models using path or external information, and (3) the more complex models that exploit path or external information are typically extensions of these simpler models, and are often initialized with parameters estimated by such simpler models, so improvements to the simpler models should yield corresponding improvements to the more complex models as well [Nguyen et al., 2016a].

It is worth to further explore those KG completion embedding models for a new application where we could formulate its corresponding data into triples. For example, in Web search engines, we observe user-oriented relationships between submitted queries and documents returned by the search engines. That is, we have triple representations (query, user, document) in which for each user-oriented relationship, we would have many queries and documents, resulting in a lot of Many-to-Many relationships. Inspired by this observation, ?) applied STransE [Nguyen et al., 2016a] for search personalization to re-rank the search documents returned by a search engine for users’ submitted queries. Other application examples can be also found for recommender systems [Zhang et al., 2016, He et al., 2017, Cao et al., 2019], social relation extraction [Tu et al., 2017] and visual relation detection [Zhang et al., 2017].

Future research directions might also include: (i) Combining logical rules which contain rich background information and KG triples in a unified KG completion framework, e.g. jointly embedding KGs and logical rules [Guo et al., 2016, Yang et al., 2017]. (ii) Recent embedding models for KG completion hold a closed-world assumption where the KGs are fixed (i.e. new entities might not be added easily), therefore it would be worth exploring open-world KG completion models to connect unseen entities to the existing KGs [Shi and Weninger, 2018]. (iii) Investigating efficient approaches which can be applied to large-scale KGs of millions of entities and relations [Zhang et al., 2020].

In this paper, we have presented a comprehensive survey of embedding models of entity and relationships for knowledge graph completion. This paper also provides update-to-date experimental results of the embedding models for the entity prediction (i.e. link prediction) task on benchmark datasets FB15k, WN18, FB15k-237 and WN18RR. We hope that this paper serves its purpose by providing a concrete foundation for future research and applications on the topic.

Appendix

Triple Classification—Task Description

The triple classification task was first introduced by ?), and since then it has been used to evaluate various embedding models. The aim of this task is to predict whether a triple $(h,r,t)$ is correct or not. For classification, a relation-specific threshold $\theta_{r}$ is set for each relation type $r$ . If the plausibility score of an unseen test triple $(h,r,t)$ is higher than $\theta_{r}$ then the triple will be classified as correct, otherwise incorrect. Following ?), the relation-specific thresholds are determined by maximizing the micro-averaged accuracy, which is a per-triple average, on the validation set.

Triple Classification—Datasets

Information about benchmark datasets for the triple classification task is given in Table 5. FB13 and WN11 [Socher et al., 2013] are derived from the large real-world KG Freebase [Bollacker et al., 2008] and the large lexical KG WordNet [Miller, 1995], respectively. Note that when creating the FB13 and WN11 datasets, ?) already filtered out triples from the test set if either or both of their head and tail entities also appear in the training set in a different relation type or order.

Triple Classification—Main Results

Table 6 presents the triple classification results of KG completion models on the WN11 and FB13 datasets. The first 9 rows report the performance of models that use TransE/DISTMULT to initialize the entity and relation vectors. The last 12 rows present the accuracy of models with randomly initialized parameters. Note that there are higher triple classification results computed for NTN, Bilinear-comp and TransE-comp when entity vectors are initialized by averaging the pre-trained GloVe word vectors [Pennington et al., 2014]. It is not surprising because many entity names in WordNet and Freebase are lexically meaningful. However, this is not always the case w.r.t. many domain-specific KGs.

Bibliography110

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Agirre et al., 2013] Eneko Agirre, Oier López de Lacalle, and Aitor Soroa. 2013. Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics , 40(1):57–84.
2[Baeza-Yates and Ribeiro-Neto, 2011] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the concepts and technology behind search, Second edition . Pearson Education Ltd., Harlow, England.
3[Balažević et al., 2019] Ivana Balažević, Carl Allen, and Timothy M Hospedales. 2019. Hypernetwork knowledge graph embeddings. In ICANN , pages 553–565.
4[Balazevic et al., 2019] Ivana Balazevic, Carl Allen, and Timothy Hospedales. 2019. Tuck ER: Tensor factorization for knowledge graph completion. In EMNLP-IJCNLP , pages 5185–5194.
5[Berant et al., 2013] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. In EMNLP , pages 1533–1544.
6[Bollacker et al., 2008] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In SIGMOD , pages 1247–1250.
7[Bordes et al., 2011] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning Structured Embeddings of Knowledge Bases. In AAAI , pages 301–306.
8[Bordes et al., 2012] Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2012. A Semantic Matching Energy Function for Learning with Multi-relational Data. Machine Learning , 94(2):233–259.