Relation Embedding with Dihedral Group in Knowledge Graph
Canran Xu, Ruijiang Li

TL;DR
This paper introduces DihEdral, a novel knowledge graph embedding model based on dihedral groups, which effectively captures relation compositions and properties, improving link prediction performance over existing bilinear models.
Contribution
The paper proposes DihEdral, a dihedral group-based embedding model that models relation compositions and properties, enhancing interpretability and predictive accuracy in knowledge graphs.
Findings
DihEdral captures symmetry, inversion, and composition properties.
Outperforms existing bilinear models in link prediction tasks.
Comparable or superior to deep learning models like ConvE.
Abstract
Link prediction is critical for the application of incomplete knowledge graph (KG) in the downstream tasks. As a family of effective approaches for link predictions, embedding methods try to learn low-rank representations for both entities and relations such that the bilinear form defined therein is a well-behaved scoring function. Despite of their successful performances, existing bilinear forms overlook the modeling of relation compositions, resulting in lacks of interpretability for reasoning on KG. To fulfill this gap, we propose a new model called DihEdral, named after dihedral symmetry group. This new model learns knowledge graph embeddings that can capture relation compositions by nature. Furthermore, our approach models the relation embeddings parametrized by discrete values, thereby decrease the solution space drastically. Our experiments show that DihEdral is able to capture…
| Component | Symmetric | Skew-Symmetric | Composition | ||
|---|---|---|---|---|---|
| Abelian | Non-Abelian | ||||
| DistMult | ✓ | ✓ | NA† | ||
| ComplEx | ✓ | NA† | |||
| ANALOGY | ✓ | NA† | |||
| SimplE | NA† | ||||
| DihEdral | both in | either in | |||
| Dataset | Train | Valid | Test | ||
|---|---|---|---|---|---|
| WN18 | 41k | 18 | 141k | 5k | 5k |
| WN18RR | 41k | 11 | 87k | 3k | 3k |
| FB15K | 15k | 1.3k | 483k | 50k | 59k |
| FB15K-237 | 15k | 237 | 273k | 18k | 20k |
| YAGO3-10 | 123k | 37 | 1M | 5k | 5k |
| WN18 | FB15K | |||||||
| HITS@N | MRR | HITS@N | MRR | |||||
| 1 | 3 | 10 | 1 | 3 | 10 | |||
| TransE† Bordes et al. (2013) | 8.9 | 82.3 | 93.4 | 45.4 | 23.1 | 47.2 | 64.1 | 22.1 |
| DistMult† Yang et al. (2015) | 72.8 | 91.4 | 93.6 | 82.2 | 54.6 | 73.3 | 82.4 | 65.4 |
| ComplEx† Trouillon et al. (2016) | 93.6 | 94.5 | 94.7 | 94.1 | 59.9 | 75.9 | 84.0 | 69.2 |
| HolE Nickel et al. (2016) | 93.0 | 94.5 | 94.7 | 93.8 | 40.2 | 61.3 | 73.9 | 52.4 |
| ANALOGY Liu et al. (2017) | 93.9 | 94.4 | 94.7 | 94.2 | 64.6 | 78.5 | 85.4 | 72.5 |
| Single DistMult Kadlec et al. (2017) | — | — | 94.6 | 79.7 | — | — | 89.3 | 79.8 |
| SimplE Kazemi and Poole (2018) | 93.9 | 94.4 | 94.7 | 94.2 | 66.0 | 77.3 | 83.8 | 72.7 |
| R-GCN Schlichtkrull et al. (2018) | 69.7 | 92.9 | 96.4 | 81.9 | 60.1 | 76.0 | 84.2 | 69.6 |
| ConvE Dettmers et al. (2018) | 93.5 | 94.6 | 95.6 | 94.3 | 55.8 | 72.3 | 83.1 | 65.7 |
| D4-STE | 94.2 | 94.8 | 95.2 | 94.6 | 64.1 | 80.3 | 87.7 | 73.3 |
| D4-Gumbel | 94.2 | 94.9 | 95.4 | 94.6 | 64.8 | 78.2 | 86.4 | 72.8 |
| WN18RR | FB15K-237 | YAGO3-10 | ||||||||||
| HITS@N | MRR | HITS@N | MRR | HITS@N | MRR | |||||||
| 1 | 3 | 10 | 1 | 3 | 10 | 1 | 3 | 10 | ||||
| DistMult† | 39.0 | 44.0 | 49.0 | 43.0 | 15.5 | 26.3 | 41.9 | 24.1 | 24.0 | 38.0 | 54.0 | 34.0 |
| ComplEx† | 41.0 | 46.0 | 51.0 | 44.0 | 15.8 | 27.5 | 42.8 | 24.7 | 26.0 | 40.0 | 55.0 | 36.0 |
| R-GCN | — | — | — | — | 15.1 | 26.4 | 41.7 | 24.8 | — | — | — | — |
| ConvE† | 40.0 | 44.0 | 52.0 | 43.0 | 23.7 | 35.6 | 50.1 | 32.5 | 35.0 | 49.0 | 62.0 | 44.0 |
| MINERVA∗ | 41.3 | 45.6 | 51.3 | 44.8 | 21.7 | 32.9 | 45.6 | 29.3 | — | — | — | — |
| D4-STE | 45.2 | 49.1 | 53.6 | 48.0 | 23.0 | 35.3 | 50.2 | 32.0 | 38.1 | 52.3 | 64.3 | 47.2 |
| D4-Gumbel | 44.2 | 50.5 | 55.7 | 48.6 | 20.4 | 33.2 | 49.6 | 30.0 | 29.4 | 43.6 | 57.3 | 38.8 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
Relation Embedding with Dihedral Group in Knowledge Graph
Canran Xu
eBay Inc.
[email protected] &Ruijiang Li 11footnotemark: 1
eBay Inc.
[email protected] 1Equal contribution.
Abstract
Link prediction is critical for the application of incomplete knowledge graph (KG) in the downstream tasks. As a family of effective approaches for link predictions, embedding methods try to learn low-rank representations for both entities and relations such that the bilinear form defined therein is a well-behaved scoring function. Despite of their successful performances, existing bilinear forms overlook the modeling of relation compositions, resulting in lacks of interpretability for reasoning on KG. To fulfill this gap, we propose a new model called DihEdral, named after dihedral symmetry group. This new model learns knowledge graph embeddings that can capture relation compositions by nature. Furthermore, our approach models the relation embeddings parametrized by discrete values, thereby decrease the solution space drastically. Our experiments show that DihEdral is able to capture all desired properties such as (skew-) symmetry, inversion and (non-) Abelian composition, and outperforms existing bilinear form based approach and is comparable to or better than deep learning models such as ConvE Dettmers et al. (2018).
1 Introduction
Large-scale knowledge graph (KG) plays a critical role in the downstream tasks such as semantic search Berant et al. (2013), dialogue management He et al. (2017) and question answering Bordes et al. (2014). In most cases, despite of its large scale, KG is not complete due to the difficulty to enumerate all facts in the real world. The capability of predicting the missing links based on existing dataset is one of the most important research topics for years. A common representation of KG is a set of triples (head, relation, tail), and the problem of link prediction can be viewed as predicting new triples from the existing set. A popular approach is KG embeddings, which maps both entities and relations in the KG to a vector space such that the scoring function of entities and relations for ground truth distinguishes from false facts Socher et al. (2013); Bordes et al. (2013); Yang et al. (2015). Another family of approaches explicitly models the reasoning process on KG by synthesizing information from paths Guu et al. (2015). More recently, researchers are applying deep learning methods to KG embeddings so that non-linear interaction between entities and relations are enabled Schlichtkrull et al. (2018); Dettmers et al. (2018).
The standard task for link prediction is to answer queries (h, r, ?) or (? r, t). In this context, recent works on KG embedding focusing on bilinear form methods Trouillon et al. (2016); Nickel et al. (2016); Liu et al. (2017); Kazemi and Poole (2018) are known to perform reasonably well. The success of this pack of models resides in the fact they are able to model relation (skew-) symmetries. Furthermore, when serving for downstream tasks such as learning first-order logic rule and reasoning over the KG, the learned relation representation is expected to discover relation composition by itself. One key property of relation composition is that in many cases it can be non-commutative. For example, exchanging the order between parent_of and spouse_of will result in completely different relation (parent_of as opposed to parent_in_law_of). We argue that, in order to learn relation composition within the link prediction task, this non-commutative property should be explicitly modeled.
In this paper, we proposed DihEdral to model the relation in KG with the representation of dihedral group. The elements in a dihedral group are constructed by rotation and reflection operations over a 2D symmetric polygon. As the matrix representations of dihedral group can be symmetric or skew-symmetric, and the multiplication of the group elements can be Abelian or non-Abelian, it is a good candidate to model the relations with all the corresponding properties desired.
To the best of our knowledge, this is the first attempt to employ finite non-Abelian group in KG embedding to account for relation compositions. Besides, another merit of using dihedral group is that even the parameters are quantized or even binarized, the performance in link prediction tasks can be improved over state-of-the-arts methods in bilinear form due to the implicit regularization imposed by quantization.
The rest of paper is organized as follows: in (§2) we present the mathematical framework of bilinear form modeling for link prediction task, followed by an introduction to group theory and dihedral group. In (§3) we formalize a novel model DihEdral to represent relations with fully expressiveness. In (§4, §5) we develop two efficient ways to parametrize DihEdral and reveal that both approaches outperform existing bilinear form methods. In (§6) we carried out extensive case studies to demonstrate the enhanced interpretability of relation embedding space by showing that the desired properties of (skew-) symmetry, inversion and relation composition are coherent with the relation embeddings learned from DihEdral.
2 Preliminaries
2.1 Bilinear From for KB Link Prediction
Let and be the set of entities and relations. A triple , where are the head and tail entities, and is a relation corresponding to an edge in the KG.
In a bilinear form, the entities , are represented by vectors where , and relation is represented by a matrix . The score for the triple is defined as . A good representation of the entities and relations are learned such that the scores are high for positive triples and low for negative triples.
2.2 Group and Dihedral Group
Let be two elements in a set , and be a binary operation between any two elements in . The set forms a group when the following axioms are satisfied:
Closure
For any two element , is also an element in .
Associativity
For any , .
Identity
There exists an identity element in such that, for every element in , the equation holds.
Inverse
For each element , there is its inverse element such that .
If the number of group elements is finite, the group is called a finite group. If the group operation is commutative, i.e. for all and , the group is called Abelian; otherwise the group is non-Abelian.
Moreover, if the group elements can be represented by a matrix, with group operations defined as matrix multiplications, the identity element is represented by the identity matrix and the inverse element is represented as matrix inverse. In the following, we will not distinguish between group element and its corresponding matrix representation when no confusion exists.
A dihedral group is a finite group that supports symmetric operations of a regular polygon in two dimensional space. Here the symmetric operations refer to the operator preserving the polygon. For a -side () polygon, the corresponding dihedral group is denoted as that consists of elements, within which there are rotation operators and reflection operators. A rotation operator rotates the polygon anti-clockwise around the center by a degree of , and a reflection operator mirrors the rotation vertically.
The element in the dihedral group can be represented as 2D orthogonal matrices111There are more than one 2D representations for the dihedral group , and we use the orthogonal representation throughout the paper. Check Steinberg 2012 for details.:
[TABLE]
where . Correspondingly, the group operation of dihedral group can be represented as multiplication of the representation matrices. Note that when is evenly divided by , rotation matrices and are skew-symmetric, and all the reflection matrices and rotation matrices , are symmetric. The representation of is shown in Figure 1.
3 Relation Modeling with Dihedral Group and Expressiveness
We propose to model the relations by the group elements in . Like ComplEx Trouillon et al. (2016), we assume an even number of latent dimensions . More specifically, the relation matrix takes a block diagonal form where for . The corresponding embedding vectors and take the form of and where respectively. As a result, the score for a triple in bilinear form can be written as a sum of these components , We name the model DihEdral because each component is a representation matrix of a dihedral group element.
Lemma 1**.**
The relation matrix of DihEdral is orthogonal, i.e. .
Lemma 2**.**
The score of satisfies , consequently maximizing score w.r.t. is equivalent to minimizing .
Theorem 1**.**
The relations matrices in DihEdral form a group under matrix multiplication.
Though its relation embedding takes discrete values, DihEdral is fully expressive as it is able to model relations with desired properties for each component by the corresponding matrices in . The properties are summarized in Table 1, with comparison to DistMult Yang et al. (2015), ComplEx Trouillon et al. (2016), ANALOGY Liu et al. (2017) and SimplE Kazemi and Poole (2018). 222Note that the condition listed in the table is sufficient but not necessary for the desired property. The details of expressiveness are described as follows. For notation convenience, we denote all the possible true triples, and all the possible false triples.
Symmetric
A relation is symmetric iff . Symmetric relations in the real world include synonym, similar_to.
Note that with DihEdral, the component can be a reflection matrix which is symmetric and off-diagonal. This is in contrast to DistMult and ComplEx where the relation matrix has to be diagonal when it is symmetric at the same time.
Skew-Symmetric
A relation is skew-symmetric iff . Skew-symmetric relations in the real world include father_of, member_of.
When is a multiple of , pure skew-symmetric matrices in can be chosen. As a result, the relation is guaranteed to be skew-symmetric satisfying .
Inversion
is the inverse of iff . As a real world example, parent_of is the inversion of child_of.
The inverse of the relation is represented by in an ideal situation: For two positive triples and , we have and in an ideal situation (cf. Lemma 2), With enough occurrences of pair we have .
Composition
is composition of and , denoted as iff . Example of composition in the real world includes nationality = born_in_city city_belong_to_nation. Depending on the commutative property, there are two cases of relation compositions:
- •
Abelian and are Abelian if . Real world example includes opposite_gender profession profession opposite_gender.
- •
Non-Abelian and are non-Abelian if . Real world example include parent_of spouse_of spouse_of parent_of.
In DihEdral, the relation composition operator corresponds to the matrix multiplication of the corresponding representations, i.e. . Consider three positive triples , and . In the ideal situation, we have , , (cf. Lemma 2), and further . With enough occurrences of such pairs in the training dataset, we have .
Note that although all the rotation matrices form a subgroup to dihedral group, and hence algebraically closed, the rotation subgroup could not model non-Abelian relations. To model non-Abelian relation compositions at least one reflection matrix should be involved.
4 Training
In the standard traing framework for KG embedding models, parameters , i.e. the union of entity and relation embeddings, are learnt by stochastic optimization methods. For each minibatch of positive triples, a small number of negative triples are sampled by corrupting head or tail for each positive triple, then related parameters in the model are updated by minimizing the binary negative log-likelihood such that positive triples will get higher scores than negative triples. Specifically, the loss function is written as follows,
[TABLE]
where is the regularization coefficient for entity embeddings only, and are the sets of positive and sampled negative triples in a minibatch, and equals to if otherwise . is a sigmoid function defined as .
Special treatments of the relation representations are required as they takes discrete values. In the next subsections we describe a reparametrization method for general , followed by a simple approach when takes small integers values. With these treatments, DihEdral could be trained within the standard framework.
4.1 Gumbel-Softmax Approach
Each relation component can be parametrized with a one-hot variable encoding choices of matrices in : where enumerates . The number of parameters for each relation is in this approach.
One-hot variable is further parametrized by by Gumbel trick Jang et al. (2017) with the following steps: 1) take i.i.d. samples from a Gumbel distribution: , where are samples from a uniform distribution; 2) use log-softmax form of to parametrize :
[TABLE]
where is the tunable temperature. During training, we start with high temperature, e.g. , to drive the system out of pool local minimums, and gradually cool the system with where is the number of epochs elapsed.
4.2 Reparametrization with Binary Variables
Another parametrization technique for where is to parametrize each element in the matrix directly. Specifically we have
[TABLE]
where , , and is the reflection indicator . Both and can be parametrized by the same set of binary variables :
[TABLE]
In the forward pass, each binary variable is parametrized by taking a element-wise sign function of a real number: where .
In the backward pass, since the original gradient of sign function is almost zero everywhere such that will not be activated, the gradient of loss with respect to the real variable is estimated with the straight-through estimator (STE) Yin et al. (2019). The functional form for STE is not unique and worth profound theoretical study. In our experiments, we used identity STE Bengio et al. (2013):
[TABLE]
where stands for element-wise identity.
For these two approaches, we name the model as D-Gumbel for Gumbel-Softmax approach and D-STE for reparametrization using binary variable approach.
5 Experimental Result
This section presents our experiments and results. We first introduce the benchmark datasets used in our experiments, after that we evaluate our approach in the link prediction task.
5.1 Datasets
Introduced in Bordes et al. (2013), WN18 and FB15K are popular benchmarks for link prediction tasks. WN18 is a subset of the famous WordNet database that describes relations between words. In WN18 the most frequent types of relations form reversible pairs (e.g., hypernym to hyponym, part_of to has_part). FB15K is a subsampling of Freebase limited to 15k entities, introduced in Bordes et al. (2013). It contains triples with different characteristics (e.g., one to-one relations such as capital_of to many-to-many such as actor_in_film). YAGO3-10 Dettmers et al. (2018) is a subset of YAGO3 Suchanek et al. (2007) with each entity contains at least 10 relations.
As noted in Toutanova et al. (2015); Dettmers et al. (2018), in the original WN18 and FB15k datasets there are a large amount of test triples appear as reciprocal form of the training samples, due to the reversible relation pairs. Therefore, these authors eliminated the inverse relations and constructed corresponding subsets: WN18RR with 11 relations and FB15K-237 with 237 relations, both of which are free from test data leak. All datasets statistics are shown in Table 2.
5.2 Evaluation Metric
We use the popular metrics filtered HITS@1, 3, 10 and mean reciprocal rank (MRR) as our evaluation metrics as in Bordes et al. (2013).
5.3 Model Selection and Hyper-parameters
We implemented DihEdral in PyTorch Paszke et al. (2017). In all our experiments, we selected the hyperparameters of our model in a grid search setting for the best MRR in the validation set. We trained DK-Gumbel for and DK-STE for with AdaGrad optimizer Duchi et al. (2011), and we didn’t notice significant difference in terms of the evaluation metrics when varying . In the following we only report the result for .
For D4-Gumbel, we performed grid search for the regularization coefficient and learning rate . For D4-STE, hyperparamter ranges for the grid search were as follows: [0.001, 0.01, 0.1, 0.2], learning rate [0.01, 0.02, 0.03, 0.05, 0.1]. For both settings we performed grid search with batch sizes [512, 1024, 2048] and negative sample ratio [1, 6, 10]. We used embedding dimension for FB15K, for both FB15K-237 and YAGO3-10, for WN18 and WN18RR. We used the standard train/valid/test splits provided with these datasets.
The results of link predictions are shown in Table 3 and 4, where the results for the baselines are directly taken from original literature. DihEdral outperforms almost all models in bilinear form, and even ConvE in FB15K, WN18RR and YAGO3-10. The result demonstrates that even DihEdral takes discretized value in relation representations, proper modeling the underlying structure of relations using is essential.
6 Case Studies
The learned representation from DihEdral is not only able to reach the state-of-the-art performance in link prediction tasks, but also provides insights with its special properties. In this section, we present the detailed case studies on these properties. In order to achieve better resolutions, we increased the embedding dimension to for WN18 datasets.
6.1 Inversion
We show the multiplication of some pairs of inversion relations on WN18 and FB15K in Figure 2, and the result is close to an identity matrix. For the relation pair {_member_of_domain_usage, _synset_domain_usage_of}, the multiplication deviates from ideal identity matrix as the performance for these two relations are poorer compared to the others. We also repeat the same case study for other bilinear embedding methods, however their multiplications are not identity, but close to diagonal matrices with different elements.
6.2 Symmetry and Skew-Symmetry
Since the KB datasets do not contain negative triples explicitly, there is no penalty to model skew-symmetric relations with symmetric matrices. This is perhaps the reason why DistMult performs well on FB15K dataset in which a lot of relations are skew-symmetric.
To resolve this ambiguity, for each positive triple with a definite skew-symmetric relation , a negative triple is sampled with probability 0.5. After adding this new negative sampling scheme in D4-Gumbel, the symmetric and skew-symmetric relations can be distinguished on WN18 dataset without reducing performance on link prediction tasks. Figure 3 shows that both symmetric and skew-symmetric relations favor corresponding components in as expected. Again, due to imperfect performance of _synset_domain_topic_of, its corresponding representation is imperfect as well. We also conduct the same experiment without adding this sampling scheme, the histogram for the symmetric relations are similar, but there is no strong preference for skew-symmetric relations.
6.3 Relation Composition
In FB15K-237 dataset the majority of patterns is relation composition. However, these compositions are Abelian only because all the inverse relations are filtered out on purpose. To justify if non-Abelian relation compositions can be discovered by DihEdral in an ideal situation, we generate a synthetic dataset called FAMILY. Specifically, we first generated two generations of people with equal number of male and females in each generation, and randomly assigned spouse edges within each generation and child and parent edges between the two generations, after which the sibling, parent_in_law and child_in_law edges are connected based on commonsense logic.
We trained D4-Gumbel on FAMILY with latent dimension . In addition to the loss in Eq. 2, we add the following regularization term to encourage the score of positive triple to be higher than that of negative triple for each component independently.
[TABLE]
where , and the corresponding negative triple .
For each composition , we compute the histogram of . The result for relation compositions in FB15K-237 and FAMILY is shown in Figure 4, from which we could see good composition as matrix multiplication. We also reveal the non-Abelian property in FAMILY by exchanging the order of and .
7 Related Works
In this section we discuss the related works and their connections to our approach.
TransE Bordes et al. (2013) takes relations as a translating operator between head and tail entities. More complicated distance functions Wang et al. (2014); Lin et al. (2015b, a) are also proposed as extensions to TransE. TorusE Ebisu and Ichise (2018) proposed a novel distance function defined over a torus by transform the vector space by an Abelian group onto a -dimensional torus. ProjE Shi and Weninger (2017) designs a neural network with a combination layer and a projection layer. R-GCN Schlichtkrull et al. (2018) employs convolution over multiple entities to capture spectrum of the knowledge graph. ConvE Dettmers et al. (2018) performs 2D convolution on the concatenation of entity and relation embeddings, thus by nature introduces non-linearity to enhance expressiveness.
In RESCAL Nickel et al. (2011) each relation is represented by a full-rank matrix. As a downside, there is a huge number of parameters in RESCAL making the model prone to overfitting. A totally symmetric DistMult Yang et al. (2015) model simplifies RESCAL by representing each relation with a diagonal matrix. To parametrize skew-symmetric relations, ComplEx Trouillon et al. (2016) extends DistMult by using complex-valued instead of real-valued vectors for entities and relations. The representation matrix of ComplEx supports both symmetric and skew-symmetric relations while being closed under matrix multiplication. HolE Nickel et al. (2016) models the skew-symmetry with circular correlation between entity embeddings, thus ensures shifts in covariance between embeddings at different dimensions. It was recently showed that HolE is isomophic to ComplEx Hayashi and Shimbo (2017). ANALOGY Liu et al. (2017) and SimplE Kazemi and Poole (2018) both reformulate the tensor decomposition approach in light of analogical and reversible relations.
Though embedding based approach achieves state-of-the-art performance on link prediction task, symbolic relation composition is not explicitly modeled. In contrast, the latter goal is currently popularized by directly modeling the reasoning paths Neelakantan et al. (2015); Xiong et al. (2017); Das et al. (2018); Lin et al. (2018); Guo et al. (2019). As paths are consistent with reasoning logic structure, non-Abelian composition is supported by nature.
DihEdral is more expressive when compared to other bilinear form based embedding methods such as DistMult, ComplEX and ANALOGY. As the relation matrix is restricted to be orthogonal, DihEdral could bridge translation based and bilinear form based approaches as the training objective w.r.t. the relation matrix is similar (cf Lemma 2). Besides, DihEdral is the first embedding method to incorporate non-Abelian relation compositions in terms of matrix multiplications (cf. Theorem 1).
8 Conclusion
This paper proposed DihEdral for KG relation embedding. By leveraging the desired properties of dihedral group, relation (skew-) symmetry, inversion, and (non-) Abelian compositions are all supported. Our experimental results on benchmark KGs showed that DihEdral outperforms existing bilinear form models and even deep learning methods. Finally, we demonstrated that the above g properties can be learned from DihEdral by extensive case studies, yielding a substantial increase in interpretability from existing models.
Acknowledgments
The authors would like to thank Vivian Tian, Hua Yang, Steven Li and Xiaoyuan Wu for their supports, and anonymous reviewers for their helpful comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bengio et al. (2013) Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation . ar Xiv preprint ar Xiv:1308.3432 .
- 2Berant et al. (2013) Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on Freebase from question-answer pairs . In Proceedings of EMNLP .
- 3Bordes et al. (2014) Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question answering with subgraph embeddings . In Proceedings of EMNLP .
- 4Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data . In Proceedings of Neur I Ps .
- 5Das et al. (2018) Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew Mc Callum. 2018. Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning . In Proceedings in ICLR .
- 6Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings . In Proceedings of AAAI .
- 7Duchi et al. (2011) John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization . J. Mach. Learn. Res. , 12:2121–2159.
- 8Ebisu and Ichise (2018) Takuma Ebisu and Ryutaro Ichise. 2018. Torus E: Knowledge graph embedding on a lie group . In Proceedings of AAAI .
