Leveraging Semantics for Incremental Learning in Multi-Relational   Embeddings

Angel Daruna; Weiyu Liu; Zsolt Kira; Sonia Chernova

arXiv:1905.12181·cs.LG·July 10, 2019

Leveraging Semantics for Incremental Learning in Multi-Relational Embeddings

Angel Daruna, Weiyu Liu, Zsolt Kira, Sonia Chernova

PDF

Open Access

TL;DR

This paper introduces Incremental Semantic Initialization (ISI), a novel method for incremental learning in multi-relational embeddings that improves query performance and reduces training epochs by leveraging semantic similarities.

Contribution

The paper presents ISI, a new incremental learning approach that initializes semantic concepts based on related previously learned embeddings, enhancing scalability and efficiency.

Findings

01

ISI improves immediate query performance by 41.4%.

02

ISI reduces epochs to convergence by 78.2%.

03

Effective on AI2Thor and MatterPort3D datasets.

Abstract

Service robots benefit from encoding information in semantically meaningful ways to enable more robust task execution. Prior work has shown multi-relational embeddings can encode semantic knowledge graphs to promote generalizability and scalability, but only within a batched learning paradigm. We present Incremental Semantic Initialization (ISI), an incremental learning approach that enables novel semantic concepts to be initialized in the embedding in relation to previously learned embeddings of semantically similar concepts. We evaluate ISI on mined AI2Thor and MatterPort3D datasets; our experiments show that on average ISI improves immediate query performance by 41.4%. Additionally, ISI methods on average reduced the number of epochs required to approach model convergence by 78.2%.

Tables3

Table 1. Table 1: AI2Thor Knowledge Graph Statistics

3 Relation Types, 106 Entities
Median Count per Environment
Room Type	Loc. Rel.	Mat. Rel.	Aff. Rel.	Num. Ent.	Num. Rooms
Bath-	28	21	46	18	30
Bed-	28.5	16	54.5	20	30
Kitchen	59.5	51	109	27	30
Living-	22.5	8	37	20	30
All	29.5	18.5	50	20	120

Table 2. Table 2: Epochs-to -Convergence

Init. Method	Avg.	Std. Dev.
Xavier	112.6	48.1
IU	37.9	16.2
ES	16.6	9.9
RS	27.1	17.5
ERS	29.9	23.3

Table 3. Table 3: Ranked Affordance Generalizations for Entities in MatterPort3D

fan entity		bottle entity		stove entity
ISI	Xavier	ISI	Xavier	ISI	Xavier
1. pick up (v)	1. pick up (v)	1. put (v)	1. vase (n)	1. open (v)	1. shelf (n)
2. put (v)	2. stone (n)	2. pick up (v)	2. shelf (n)	2. pick up (v)	2. vase (n)
3. turn on (v)	3. glass (n)	3. fill (v)	3. pick up (v)	3. turn off (v)	3. turn off (v)
4. turn off (v)	4. empty (v)	4. slice (v)	4. break (v)	4. close (v)	4. painting (n)

Equations20

v_{\overset{e}{^}}^{n} = \frac{1}{∣ I ∣} e \in I \sum v_{e}^{n - 1}

v_{\overset{e}{^}}^{n} = \frac{1}{∣ I ∣} e \in I \sum v_{e}^{n - 1}

\displaystyle\mathcal{I}_{\textrm{ES}}=\operatorname*{arg\_top\_k}_{e\in\mathcal{E}^{n-1}}\big{(}\boldsymbol{\pi}_{e}\cdot\boldsymbol{\pi}_{\hat{e}}\big{)}

\displaystyle\mathcal{I}_{\textrm{ES}}=\operatorname*{arg\_top\_k}_{e\in\mathcal{E}^{n-1}}\big{(}\boldsymbol{\pi}_{e}\cdot\boldsymbol{\pi}_{\hat{e}}\big{)}

\displaystyle\boldsymbol{\alpha}_{r,j}=\bigg{\{}

\displaystyle\boldsymbol{\alpha}_{r,j}=\bigg{\{}

\displaystyle\mathcal{I}_{\textrm{RS}}=\operatorname*{arg\_top\_k}_{e\in\mathcal{E}^{n-1}}\Bigg{(}\sum_{r\in\mathcal{R}^{n-1}}\textbf{v}_{e}\cdot\bar{\boldsymbol{\alpha}}_{r}\Bigg{)}

\displaystyle\mathcal{I}_{\textrm{RS}}=\operatorname*{arg\_top\_k}_{e\in\mathcal{E}^{n-1}}\Bigg{(}\sum_{r\in\mathcal{R}^{n-1}}\textbf{v}_{e}\cdot\bar{\boldsymbol{\alpha}}_{r}\Bigg{)}

\displaystyle\mathcal{I}_{\textrm{ERS}}=\operatorname*{arg\_top\_k}_{e\in\,\mathcal{I}_{\textrm{ES}}}\Bigg{(}\sum_{r\in\mathcal{R}^{n-1}}\textbf{v}_{e}\cdot\bar{\boldsymbol{\alpha}}_{r}\Bigg{)}

\displaystyle\mathcal{I}_{\textrm{ERS}}=\operatorname*{arg\_top\_k}_{e\in\,\mathcal{I}_{\textrm{ES}}}\Bigg{(}\sum_{r\in\mathcal{R}^{n-1}}\textbf{v}_{e}\cdot\bar{\boldsymbol{\alpha}}_{r}\Bigg{)}

MRR = \frac{1}{N} n = 1 \sum N \frac{1}{R _{P}^{n}}

MRR = \frac{1}{N} n = 1 \sum N \frac{1}{R _{P}^{n}}

MRR* = \frac{1}{N} n = 1 \sum N \frac{1}{∣ R _{G}^{n} - R _{P}^{n} ∣ + 1}

MRR* = \frac{1}{N} n = 1 \sum N \frac{1}{∣ R _{G}^{n} - R _{P}^{n} ∣ + 1}

\textbf{v}^{1}_{e}\!=\!\bigg{\{}\!\begin{tabular}[]{l}$\textbf{v}^{0}_{e}\qquad\qquad\quad\>\>\forall\,e\in{\mathcal{E}}^{0}$\\ $\textbf{ookb\_init}(e)\quad\>\forall\,e\in{\xi}^{0}$\end{tabular}

\textbf{v}^{1}_{e}\!=\!\bigg{\{}\!\begin{tabular}[]{l}$\textbf{v}^{0}_{e}\qquad\qquad\quad\>\>\forall\,e\in{\mathcal{E}}^{0}$\\ $\textbf{ookb\_init}(e)\quad\>\forall\,e\in{\xi}^{0}$\end{tabular}

W_{r}^{1} = W_{r}^{0} \forall r \in R^{1}

W_{r}^{1} = W_{r}^{0} \forall r \in R^{1}

\displaystyle\textbf{v}^{1}_{e,j}\!=\!\bigg{\{}\!\begin{tabular}[]{l}$\textbf{v}^{0}_{e,j}$ \qquad\qquad\quad\,\; if $e\in\mathcal{E}^{0}$\\ $U(v^{j}_{min},v^{j}_{max})$ \quad if $e\in\mathcal{\xi}^{0}$\\ \end{tabular}

\displaystyle\textbf{v}^{1}_{e,j}\!=\!\bigg{\{}\!\begin{tabular}[]{l}$\textbf{v}^{0}_{e,j}$ \qquad\qquad\quad\,\; if $e\in\mathcal{E}^{0}$\\ $U(v^{j}_{min},v^{j}_{max})$ \quad if $e\in\mathcal{\xi}^{0}$\\ \end{tabular}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning · Topic Modeling

Full text

Leveraging Semantics for Incremental Learning

in Multi-Relational Embeddings

Angel Daruna, Weiyu Liu, Zsolt Kira, Sonia Chernova

Institute for Robotics and Intelligent Machines

Georgia Institute of Technology, United States

{adaruna3,wliu88,zkira,chernova}@gatech.edu

Abstract

Service robots benefit from encoding information in semantically meaningful ways to enable more robust task execution. Prior work has shown multi-relational embeddings can encode semantic knowledge graphs to promote generalizability and scalability, but only within a batched learning paradigm. We present Incremental Semantic Initialization (ISI), an incremental learning approach that enables novel semantic concepts to be initialized in the embedding in relation to previously learned embeddings of semantically similar concepts. We evaluate ISI on mined AI2Thor and MatterPort3D datasets; our experiments show that on average ISI improves immediate query performance by 41.4%. Additionally, ISI methods on average reduced the number of epochs required to approach model convergence by 78.2%.

Keywords: relational learning, incremental learning, semantic reasoning

1 Introduction

Robots operating in human environments benefit from using knowledge representations that encode information in semantically meaningful ways to facilitate generalization and adaptability, leading to more robust task execution [1, 2, 3]. An explicit commonly used model of environment semantics defines a set of entities $\mathcal{E}$ representing known concepts (e.g. apple, metal, open), and a set of possible relations $\mathcal{R}$ (e.g. atLocation, hasAffordance) between them [3, 4, 5, 2]. Combined, $\mathcal{E}$ and $\mathcal{R}$ form a knowledge graph $\mathcal{G}$ , in which vertices represent entities and edges represent relations.

Multiple techniques have been proposed for effectively performing inference over semantic knowledge bases. Most recently, Daruna et al. [6] showed that representing knowledge graphs using multi-relational embeddings significantly outperforms prior approaches, such as directed graphs [5], Bayesian Logic Networks [4], and Description Logics [7], with respect to scalability, robustness to uncertainty, and generalizability. Multi-relational embeddings represent knowledge graphs in vector space, encoding vertices that represent entities $\mathcal{E}$ as vectors and edges that represent relations $\mathcal{R}$ as mappings. However, Daruna et al.’s work assumes all entities and relations to be known before learning the representation. This assumption is impractical for large-scale and long-term deployments of autonomous systems because each incremental discovery of a new concept would require batch retraining of the encoding.

In this work, we introduce a novel incremental learning approach for semantic data within multi-relational embeddings. We consider the “Incremental Class Learning” scenario [8], which applies to systems in which knowledge is acquired incrementally over time. Our experiments model the service robot scenario in which a home robot incrementally gains knowledge of new concepts, such as discovering new affordances, detecting new materials, or finding new objects. Our objective is to integrate each new concept into the robot’s existing semantic knowledge representation as quickly and accurately as possible, while mitigating corruption to previous concepts.

The core contribution of our work – Incremental Semantic Initialization (ISI)111Code available at [URL withheld for blind review] – enables embeddings for a novel concept (e.g. $apple$ ) to be initialized in relation to previously learned embeddings of semantically similar concepts (e.g. $banana$ ) and away from dissimilar concepts (e.g. $lamp$ ). We present three variants of our approach: Entity Similarity (ES), Relational Similarity (RS), and Hybrid Similarity (ERS) that inform the initialization of new concepts using entities, relations between entities, and both, respectively.

We validated our approach on knowledge graphs mined from AI2Thor and MatterPort3D222Datasets available at [URL withheld for blind review]. Our results show that ISI significantly outperforms the state-of-the-art in incremental multi-relational embedding initialization [9] due to ISI’s ability to initialize novel concepts in a semantically meaningful way without retraining. Additionally, we show that all ISI methods reduce the number of epochs required to reach within 8 MRR* percentage points of a model trained with batch learning over all available data by 78.2% on average when compared to prior work. As a result, our approach provides a significant efficiency improvement for the deployment of multi-relational embeddings onto robot systems in incremental learning scenarios.

2 Related Work

Semantic Reasoning for robotics applications commonly uses an explicit model of world semantics in which a knowledge graph $\mathcal{G}$ is composed of individual positive example facts, or triples, $(h,r,t)$ such that $h,t\in\mathcal{E}$ are identified as head and tail entities of the triple, respectively, for which the relation $r\in\mathcal{R}$ holds (e.g. $($ cup, hasAffordance, fill $)$ ) [3, 4, 5, 2]. Multiple computational frameworks have been proposed that enable robots to reason about semantic knowledge [3, 4, 2]. Our work focuses on the recent work on multi-relational embeddings presented in [6], which was shown to ourperform prior methods with respect to scalability and generalizability on batch learning tasks.

Multi-Relational Embeddings model a knowledge graph $\mathcal{G}$ in vector space, encoding entities $\mathcal{E}$ as vectors and relations $\mathcal{R}$ as mappings [10]. Generically, the embeddings for $\mathcal{E}$ and $\mathcal{R}$ in $\mathcal{G}$ 333Note that $\mathcal{G}$ is considered incomplete because some set of triples may be missing. Algorithms for triple classification [11] and prediction (i.e. query answering) [12] seek to account for missing information. are learned using a scoring function $f(h,r,t)$ that maps input triples to scores so that positive triples have high scores and negative triples have low scores [13]. As in [6], our work uses ANALOGY [14] to learn multi-relational embeddings. ANALOGY constrains relations to be normal linear mappings between entities by using a scoring function $f(h,r,t)=\langle\textbf{v}^{T}_{h}\textbf{W}_{r}$ , $\textbf{v}_{t}\rangle$ , where $\textbf{v}_{h},\textbf{v}_{t}$ are head and tail entity vectors, respectively, and $\textbf{W}_{r}$ is a relation mapping. This constraint enables using far fewer parameters than the most flexible semantic matching models [11, 15] while allowing for more complex relations to be expressed than translational models [12, 16], balancing scalability and expressiveness to achieve state-of-art results [10]. However, multi-relational embeddings assume all entities and relations to be known before training, which is impractical for robots in incremental learning scenarios.

Continual Learning entails learning to perform well over a new dataset or task while not degrading performance over previous datasets or tasks [8, 17, 18]. In [8], continual learning is categorized by whether the distribution of input data changes-, the distribution of target labels changes-, or the labels are from a disjoint space -across learning sessions. These are referred to as ‘Incremental Domain Learning’, ‘Incremental Class Learning’, and ‘Incremental Task Learning’, respectively. The categorization of approaches for continual learning outlined in [17, 18] include regularizing learning across datasets [19, 20], recalling previous dataset distributions using generative models or replay [21, 22], adapting the model architecture to accommodate new datasets [23, 24], and using complimentary learning systems to train on new datasets [25].

Previous work most related to ours [9] reformulated the multi-relational learning objective to enable incremental learning, using normalized-initialization [26] to initialize embeddings of new concepts during learning phases. However, the normalized-initialization algorithm was developed to initialize all model weights before any training as an improvement over previous heuristics for randomly initializing all weights of neural networks. Instead of normalized-initialization, we posit that the learned embedding space should be used to inform initialization of embeddings for new concepts during incremental learning phases. Works related to this idea are that of answering out-of-knowledge-base queries.

Out-of-Knowledge-Base (OOKB) Queries are queries relating to concepts that are missing in a knowledge graph $\mathcal{G}$ . In prior work, solutions to OOKB queries are obtained by reasoning about the current multi-relational embedding to initialize representations for OOKB concepts. In [27], the authors ‘align’ an external knowledge source with an embedding to answer queries about OOKB concepts. In other work, [28] train a graph-neural-network (GNN) to predict embeddings of OOKB concepts. The work by [29] train a deep convolutional neural network architecture to predict OOKB embeddings from text descriptions or names. We found our approach to be effective for the limited dataset size in our experiments because it requires no training to make initializations.

3 Problem Definition

The objective of the multi-relational embedding problem is to learn a continuous vector representation of a knowledge graph $\mathcal{G}$ from a dataset of triples $\mathcal{D}\!=\!\big{\{}(h,r,t)_{i},y_{i}|\,h_{i},t_{i}\!\in\!\mathcal{E},r_{i}\!\in\!\mathcal{R},y_{i}\!\in\!\{0,1\}\big{\}}$ , in which $i\!\in\!\{1...|\mathcal{D}|\}$ and $y_{i}$ designates whether a relation $r_{i}$ holds between entities $h_{i},t_{i}$ . Each entity $e\!\in\!\mathcal{E}$ is encoded as a vector $\textbf{v}_{e}\!\in\!\mathbb{R}^{d_{\mathcal{E}}}$ , and each relation $r\!\in\!\mathcal{R}$ as a mapping between vectors $\textbf{W}_{r}\!\in\!\mathbb{R}^{d_{\mathcal{R}}}$ , where $d_{\mathcal{E}}$ and $d_{\mathcal{R}}$ are the dimensions of vectors and mappings, respectively [10, 13]. Therefore, the learning objective is to find a set of embeddings $\Theta=\big{\{}\{\textbf{v}_{e}|\,e\in\mathcal{E}\},\{\textbf{W}_{r}|\,r\in\mathcal{R}\}\big{\}}$ that minimize the loss over all triples in the dataset $\mathcal{L}_{\mathcal{D}}$ ; for our implementation using ANALOGY, $\mathcal{L}_{\mathcal{D}}=\sum_{i}-\log\sigma(y_{i}\cdot\langle\textbf{v}^{T}_{h_{i}}\textbf{W}_{r_{i}},\textbf{v}_{t_{i}}\rangle)$ where $\sigma$ is a sigmoid.

The multi-relational embedding problem can be adapted for continual learning by including a new time step index $n$ that increases with each new learning session [9]. At each new learning session, the size of the entity and relation sets grow because one or more OOKB entities $\xi^{n-1}$ , where $\xi^{n-1}\cap\,\mathcal{E}^{n-1}\!=\!\emptyset$ , and relations $\Gamma^{n-1}$ , where $\Gamma^{n-1}\cap\,\mathcal{R}^{n-1}\!=\!\emptyset$ , are introduced (i.e. $\mathcal{E}^{n}\!=\!\mathcal{E}^{n-1}\!\cup\xi^{n-1}$ and $\mathcal{R}^{n}\!=\!\mathcal{R}^{n-1}\!\cup\!\Gamma^{n-1}$ ). Therefore, after initializing all embeddings for OOKB entities at the time step $n$ , vectors for previous entities remain $\textbf{v}^{n}_{e}=\textbf{v}^{n-1}_{e}|\,e\in\mathcal{E}^{n-1}$ and vectors for OOKB entities are $\textbf{v}^{n}_{e}=\textbf{v}^{n}_{\hat{e}}|\,e,\hat{e}\in\xi^{n-1}$ are added, where $\textbf{v}^{n}_{\hat{e}}$ is generated by an OOKB entity initialization method. Embeddings for the current time step $n$ are then $\Theta^{n}=\big{\{}\{\textbf{v}^{n}_{e}|\,e\in\mathcal{E}^{n}\},\{\textbf{W}^{n}_{r}|\,r\in\mathcal{R}^{n}\}\big{\}}$ .

As a result of incremental learning, the multi-relational embedding learning objective becomes finding a set of embeddings $\Theta^{n}=\big{\{}\{\textbf{v}^{n}_{e}|\,e\in\mathcal{E}^{n}\},\{\textbf{W}^{n}_{r}|\,r\in\mathcal{R}^{n}\}\big{\}}$ that minimize the loss over the dataset for that time step $\mathcal{L}_{\mathcal{D}^{n}}$ given the previous embeddings $\Theta^{n-1}$ and OOKB entities $\xi^{n-1}$ .

4 Approach

After learning a multi-relational embedding from a dataset, different regions of the entity embedding space carry distinct semantic meaning [?]. Using normalized-initialization, as in [9], for new entity embeddings can severely corrupt the embedding space because initializations are void of semantic meaning as normalized-initialization was developed only to maintain activation and back-propagated gradient variances across a neural-network. We present several incremental semantic initialization (ISI) methods for multi-relational embeddings, which reason about the learned entity embedding space to inform new entity initializations. Each method selects the most informative current entities to inform initialization of a new entity based on different embedding structure.

To initialize an embedding for an OOKB entity $\hat{e}\in\xi^{n-1}$ , our algorithms rely on identifying a set of indicator entities $\mathcal{I}$ from known entities $e\in\mathcal{E}^{n-1}$ . The entities in $\mathcal{I}$ indicate a reasonable region of the embedding space to initialize the OOKB entity’s vector $\textbf{v}_{\hat{e}}^{n}$ . In all proposed initialization algorithms, the OOKB entity vector is the centroid of the indicator entity vectors as shown below.

[TABLE]

For simplicity, the algorithm descriptions below are for the case of inserting a single OOKB entity, but multiple entities can be initialized in the same time step through the same procedure.

Below we describe the three ISI methods. Each method leverages different semantics within a multi-relational embedding to initialize OOKB entities. Entity Similarity (ES) selects indicator entities by directly comparing a new entity to current entities using word embedding similarity (e.g. word2vec [30] cosine similarity). Relational Similarity (RS) selects indicator entities as those most likely to satisfy triples connecting current entities to the new entity through relations. Hybrid Similarity (ERS) combines both algorithms by first selecting an initial indicator set using ES, then filtering to the final indicator entities using RS. These methods directly generalize to other multi-relational embedding types (e.g. TransE [12], Complex [31]) because they rely on identifying an indicator set of entities without making assumptions about the multi-relational embedding type.

Entity Similarity (ES) Initialization leverages word2vec [30] to select the indicator entities because word2vec captures distributed semantics of words, which helps identify contextually similar entities. The indicator set $\mathcal{I}$ comprises of known entities $e\!\in\!\mathcal{E}^{n-1}\,$ that have the highest cosine similarity between their word2vec vector $\boldsymbol{\pi}_{e}$ and the OOKB entity’s word2vec vector $\boldsymbol{\pi}_{\hat{e}}$ , in Equation 2.

[TABLE]

Where $\operatorname*{arg\_top\_k}$ selects top $k=|\mathcal{I}_{\textrm{ES}}|$ entities with the highest scores. The OOKB entity vector $\textbf{v}_{\hat{e}}^{n}$ is then initialized as the centroid of the vectors in $\mathcal{I}_{\textrm{ES}}$ (Equation 1). Figure 1 shows a diagram for ES initialization where the triangles are indicator entities.

Relational Similarity (RS) Initialization selects the indicator entities $\mathcal{I}$ using a set of insert triples $\{(h,r,t)_{i}\}$ that connect the new entity to current entities in the embedding via relations (i.e. the triples must satisfy $h_{i}\in\mathcal{E}^{n-1}$ if $t_{i}=\hat{e}$ or $t_{i}\in\mathcal{E}^{n-1}$ if $h_{i}=\hat{e}$ so that it can inform initialization). The insert triples would be observed by the robot when an OOKB entity is encountered.

For each relation type $r$ , resultant vectors are computed from the subset of insert triples with that relation type $\{(h_{j},r,t_{j})\}\subset\{(h,r,t)_{i}\}$ . These resultant vectors infer the possible locations of the OOKB entity based on a single triple. Specifically, for each triple in the subset, $\boldsymbol{\alpha}_{r,j}$ is computed from known parameters of $r$ and $h_{j}$ when $t_{j}=\hat{e}$ or $r$ and $t_{j}$ when $h_{j}=\hat{e}$ . Equation 3 shows this procedure for ANALOGY.

[TABLE]

All the resultant vectors for each relation type $r$ are combined by averaging to get resultant vector centroids $\bar{\boldsymbol{\alpha}}_{r}$ . The set of entities that have the highest accumulated cosine similarities to each resultant vector centroid (i.e. across each relation type) are selected as the indicator entities, as shown below.

[TABLE]

The initial value of $\textbf{v}_{\hat{e}}^{n}$ is then the centroid of the selected indicator set vectors $\mathcal{I}_{\textbf{RS}}$ as in Equation 1. Figure 1 shows the conceptual diagram for RS initialization where the triangles are indicator entities.

Hybrid Similarity (ERS) Initialization is informed by entity similarities as well as relations between entities by combining the two previous algorithms. First, a preliminary indicator set $\mathcal{I}_{\textrm{ES}}$ of the most similar entities is selected using the ES algorithm, as in Equation 2. The entities in $\mathcal{I}_{\textrm{ES}}$ are then used as inputs to the RS algorithm by requiring entities in $\operatorname*{arg\_top\_k}$ of Equation 4 to be in $\mathcal{I}_{\textrm{ES}}$ . The RS algorithm further filters the preliminary indicator entities $\mathcal{I}_{\textrm{ES}}$ to select the subset of entities that most likely satisfy the set of insert triples. The set of entities output by RS are the final set of entities that become the indicator set:

[TABLE]

5 Experimental Settings

Our evaluation is inspired by a learning scenario in which a service robot incrementally acquires novel semantic knowledge about objects in its environment. We obtain our knowledge graph by mining AI2Thor [32], a highly realistic simulator of household environments, which enables us to capture the diverse nature of real-world environments. Below, we describe our data, performance metrics, experimental procedure, and parameters.

5.1 Knowledge Graph & Metrics

The knowledge graph used in this work was mined from AI2Thor, a realistic home simulator (see Table 1). We utilize this data because benchmark datasets commonly used widely across multi-relational embedding works [10, 12, 14] do not capture the statistical nature of real-world encountered by service robots. In particular, both Freebase [33] and WordNet [34] only contain unique triples of factual information (e.g. $($ cup, hypernym, container $)$ , $($ StevenSpeilberg, directorOf, Jaws $)$ ); however, using distributions of non-unqiue triples more closely models the real-world due to variance between environments.

We manually extended the set of AI2Thor entities, comprising 82 household concepts (e.g. microwave, toilet, kitchen) and 17 affordances (e.g. pick up, open, turn on), to include 7 material properties (e.g. wood, fabric, glass), which were assigned probabilistically based on materials encountered in the SUNCG dataset [35] for a total of 106 entities. In total, our dataset contains over 15K triples, of which 352 are unique. Many triples are repeated according to distributions of the default AI2Thor environments (e.g., (bowl, atLocation, cabinet) occurs 22 times).

Responses to queries about the AI2Thor knowledge graph are best quantified on a scale because of the uncertain nature of realistic environments (e.g. multiple potential locations are likely for a given object with varying likelihoods). As a result, ground truth responses are ranked lists of candidates ordered according to observations of a unique triple in all default environments of AI2Thor (i.e. more observations give higher ranks). Instead of mean-reciprocal-rank (MRR) over a set of $N$ queries in Equation 6 that assumes a ground truth rank of 1 [10, 12, 14], we report MRR* in Equation 7 that supports variable ground truth ranks by including a ground truth rank variable $R_{G}^{n}$ in addition to the predicted rank $R_{P}^{n}$ .

[TABLE]

5.2 Experimental Procedure

To model an incremental learning scenario across all experiments, we first learn an initial embedding $\Theta^{0}$ from an initial dataset $\mathcal{D}^{0}$ . Then $\Theta^{1}$ , which is trained on a second dataset $\mathcal{D}^{1}$ , is initialized by reusing embeddings from $\Theta^{0}$ and inserting OOKB entities $\xi^{0}$ using an initialization method. Only two learning sessions were used in each experimental case because each initialization method can train to convergence with enough epochs, making a third learning session equivalent to restarting at the first learning session.

$\mathcal{D}^{0}$ consists of train, validation, and test sets of distinct unique triples (i.e. $\mathcal{D}^{0}_{Tr}\cap(\mathcal{D}^{0}_{Va}\cup\mathcal{D}^{0}_{Te})=\mathcal{D}^{0}_{Va}\cap\mathcal{D}^{0}_{Te}=\emptyset$ ). $\mathcal{D}^{0}$ is limited to only triples related to known entities $\mathcal{E}^{0}$ while all triples related to OOKB entities $\xi^{0}$ are withheld. The second dataset $\mathcal{D}^{1}$ contains triples related to all entities including $\xi^{0}$ , so that $\mathcal{E}^{1}=\mathcal{E}^{0}\cup\xi^{0}$ . Therefore, datasets generated for the later session of incremental learning subsume previous datasets. Before beginning the second training session, embeddings $\Theta^{1}=\big{\{}\{\textbf{v}^{1}_{e}|\,e\in\mathcal{E}^{1}\},\{\textbf{W}^{1}_{r}|\,r\in\mathcal{R}^{1}\}\big{\}}$ are initialized using to Equations 8 and 9 below, where ookb_init is one of the proposed (Section 4) or baseline (Section 5.3) initialization algorithms.

[TABLE]

Fine-tuning was used to train the second model’s parameters $\Theta^{1}$ over $\mathcal{D}^{1}$ because it has a simple implementation and our contribution is not focused on the catastrophic-forgetting problem. Additionally, results in [9] using better approaches like EWC [20] were only marginally better than fine-turning (2.08%). In fine-tuning, learning rates are lowered during incremental learning sessions but no new training regularization is included.

5.3 Parameter Details & Baselines

Throughout our experiments we measure and log the MRR* at each epoch when learning over dataset $\mathcal{D}^{1}$ until convergence, explicitly controlling all other variables to allow direct comparisons between different initialization methods. Convergence444The convergence condition is when the MRR* is within 8 MRR* of the joint-learning model performance. was determined using a joint-learning (Joint) model as in [9], which is essentially a batch learned multi-relational embedding trained only on $\mathcal{D}^{1}$ serving as an upper-bound.

The two baselines used in our experiments initialize new entity embeddings uniformly distributed over ranges determined by different criteria. Normalized-initialization (Xavier), used in [9], is uniformly distributed based on the dimensionality of the vector $d_{\mathcal{E}}$ so that the minimum value for each $j$ dimension is $v^{j}_{min}=-\nicefrac{{6}}{{\sqrt{d_{\mathcal{E}}}}}$ and the max value is $v^{j}_{max}=\nicefrac{{6}}{{\sqrt{d_{\mathcal{E}}}}}$ . The other baseline, we termed informed-uniform (IU), is uniformly distributed based on the range of all current entity embeddings $\textbf{v}^{0}_{e}\,\forall\,e\in\mathcal{E}^{0}$ so that the minimum value for each $j$ dimension $v^{j}_{min}=min(\textbf{v}^{0}_{e})$ and the max value is $v^{j}_{max}=max(\textbf{v}^{0}_{e})$ . Equation 10 shows how both baselines initialize entity embeddings.

[TABLE]

We determined that the best dimensionality for vectors and mappings was 100, ratio of negative over positive samples was 9, and learning rate and weight decay to train $\Theta^{0}$ was $1\mathrm{e}{-1}$ and $1\mathrm{e}{-3}$ , respectively, for all experiments using cross-validation when training the joint-learning model. When training $\Theta^{1}$ (i.e. fine-tuning), the maximum number of epochs allowed was 150, and the learning rate was decreased to $2\mathrm{e}{-3}$ . All results are reported in a ‘filtered’ setting [12], where triples already within that training and validation sets are removed before ranking. The set of OOKB entities $\xi^{0}$ in each experimental case are uniformly randomly selected as in [9]. This is repeated 30 times for each size of OOKB entity set $|\xi^{0}|\!\in\!{1,...,10}$ , recording the sets of $\xi^{0}$ so they match across initialization methods.

To determine the best indicator set size for each initialization algorithm, we ran a hyper-parameter sensitivity analysis considering MRR* and convergence seen in Figures 2(a) and 2(b), respectively. After size 4, as indicator set size is increased, MRR* performance degrades while convergence improves. Noticing this trade-off, for each algorithm we increased the indicator set size while the average number of epochs for three neighboring sizes decreased by 1 epoch or the MRR* went 1% point below the best performance, leading to indicator entity set sizes of 8, 18, and 9 for ES, RS, and ERS algorithms, respectively. Additionally, ERS used an initial indicator entity set size of 30.

6 Experimental Results

To better understand the different initialization methods, our experiments probe how each affects the immediate inference performance (Section 6.1), the time-to-convergence measured in epochs (Section 6.2), and the quality of knowledge association (Section 6.3). Here, the quality of knowledge association refers to how well new entities initialized with each method integrate with inferences about previous entities.

6.1 Improved Immediate Inferences

Concepts added to robot knowledge representations should be initialized to semantically meaningful values, enabling more accurate immediate inferences because deployed robots often must reason about new concepts without enough time to optimize their learning models for the newly encountered concepts. To evaluate each initialization method regarding this criterion, we learned an initial multi-relational embedding over a subset of the entities in the AI2Thor dataset, then initialized new OOKB entities in the embedding using each initialization method, and measured the inference performance before additional training.

Figure 3(a) reports the average MRR* performance using each entity initialization method before performing additional training to optimize the second model (i.e. $\Theta^{1}$ ). Each point is the weighted average MRR* across all queries in $\mathcal{D}^{1}$ measured for an entity initialization method and OOKB entity set size ranging from roughly 1% to 10% of $|\mathcal{E}|$ , while keeping initial embeddings and other variables the same across initialization methods.

In Figure 3(a), we see that across various sizes of new entities being inserted (i.e. $|\xi^{0}|$ ), ISI methods give better inference results for queries. Across all sizes of $\xi^{0}$ , ES, RS, and ERS initialization out perform Xavier initialization by an average 41.4%, 37.6%, 42.1%, respectively. Therefore, on average, vectors for new concepts initialized with ISI give better inference results than baselines and hence are more semantically meaningful initial embedding vectors.

6.2 Decreased Epochs-to-Convergence

Concepts added to robot knowledge representations should be efficiently integrated to save computations on deployed robots that are often compute resource and time deprived. To evaluate each initialization method regarding this criterion, we learned an initial multi-relational embedding over a subset of the entities in the AI2Thor dataset, then initialized new OOKB entities in the embedding using each initialization method, and measured how many epochs were required to converge within 8% of the joint-learning model performance during additional training.

Figure 3(b) shows the average MRR* performance across all queries in $\mathcal{D}^{1}$ during the second learning session where $|\xi^{0}|=5$ . At each epoch, the current weighted average MRR* for all queries in $\mathcal{D}^{1}$ is logged to generate the learning curve for each initialization method. This is repeated for each $\xi^{0}$ size from the experiment in Section 6.1, resulting in the averages and standard deviations of Table 6.2.

Table 6.2 shows that across various sizes of new OOKB entities being inserted (i.e. $|\xi^{0}|$ ), ISI methods converge faster than Xavier on average. Across all sizes of $\xi^{0}$ , ES, RS, and ERS initialization on average require 85.3%, 75.9%, and 73.4%, respectively, fewer epochs to converge than Xavier initialization. Therefore, ISI helps to reduce the number of computations required to optimize a multi-relational embedding with newly initialized concepts.

6.3 Mitigated Knowledge Corruption

In addition to accuracy of semantic meaning and efficiency of integration, concepts added to robot knowledge representations should also associate well with current knowledge, mitigating corruption to previously learned concepts. To test this property with each initialization method, we considered a common situation where a robot first learns an embedding in simulation (i.e. AI2Thor), then gets deployed to a realistic environment encountering new concepts (i.e. MatterPort3D (MP3D) [36]), and requires the set of known entities to be extended.

To model this sim-to-real scenario, an initial multi-relational embedding of all entities in AI2Thor was learned, and subsequently 10 new entities from a subset of MP3D were initialized in the embedding using each initialization method. Finally, the MRR* performance with respect to only entities in AI2Thor was logged during additional training. The procedure was repeated 30 times to generate all results with the subset of MP3D (i.e. 50 entities, randomly selected and filtered for a minimum of 6 non-unique triples).

Following testing procedures from Sections 6.1 and 6.2, we first probed each initialization method for accuracy and efficiency when initializing entities from MP3D and found ISI to outperform Xavier initialization. When learning new concepts across datasets, ISI improves immediate inference performance (Xavier, IU, ES, RS, ERS performed with 50.6, 77.3, 82.1, 80.9, 82.5 MRR*, respectively) and speeds up time-to-convergence (only Xavier required on average 90 epochs to converge when inserting new concepts).

In addition, the experiments showed that ISI mitigated corruption to AI2Thor embeddings when initializing MP3D entities. In Figure 4 the MRR* performance with respect to only entities in AI2Thor was logged for each initialization method during the additional training that included MP3D data. Xavier was the only initialization method to have significant effects on inference performance, dropping immediate MRR* over triples related only to AI2Thor by 37.0%. Similar results were experienced within the AI2Thor experiments, but the observation was only highlighted here because the distinct datasets make the explanation clear.

Initializing new entity embeddings with Xavier likely reduces MRR* over previous concepts because new entities from MP3D are forced into parts of the embedding space disparate from their semantic meaning, drastically changing during additional training. ISI mitigates this misplacement by adding MP3D entities to semantically similar regions of the AI2Thor entity embedding space555Note that while IU also did not cause drastic corruptions to AI2Thor embeddings, it performed more poorly when making queries regarding only newly inserted concepts from MP3D (Xavier, IU, ES, RS, ERS performed with 50.2, 67.8, 85.7, 83.6, 88.4 MRR*, respectively)..

Mitigating prior knowledge corruption by using ISI also enables multi-relational embeddings to make better generalizations about new concepts using semantic similarities to previous concepts. Therefore, MP3D entities inserted into a multi-relational embedding originally learned from AI2Thor can immediately receive more reasonable rankings of affordances, despite having only “atLocation” relations for those entities. Affordance rankings for several of the MP3D entities added to the AI2Thor multi-relational embedding are shown in Table 3 where generalizations in red are semantically incorrect and others in yellow are highly unlikely.

7 Conclusion

We presented Incremental Semantic Initialization as a means of adding OOKB entities to a previously learned embedding, as a result enabling the practical use of multi-relational embeddings in incremental robot learning scenarios. The ISI techniques, which reason about the current embedding space to initialize new embeddings, more efficiently and accurately initialize new concepts than previous methods used in [9], while mitigating corruption of previous concepts. The accompanying video demonstrates the application of this work to a physical robot learning scenario.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Pronobis and Jensfelt [2012] A. Pronobis and P. Jensfelt. Large-scale semantic mapping and reasoning with heterogeneous modalities. In Robotics and Automation (ICRA), 2012 IEEE International Conference on , pages 3515–3522. IEEE, 2012.
2Zhu et al. [2014] Y. Zhu, A. Fathi, and L. Fei-Fei. Reasoning about object affordances in a knowledge base representation. In European conference on computer vision , pages 408–424. Springer, 2014.
3Beetz et al. [2018] M. Beetz, D. Beßler, A. Haidu, M. Pomarlan, A. K. Bozcuoğlu, and G. Bartels. Know rob 2.0—a 2nd generation knowledge processing framework for cognition-enabled robotic agents. In 2018 IEEE International Conference on Robotics and Automation (ICRA) , pages 512–519. IEEE, 2018.
4[4] S. Chernova, V. Chu, A. Daruna, H. Garrison, M. Hahn, P. Khante, W. Liu, and A. Thomaz. Situated bayesian reasoning framework for robots operating in diverse everyday environments. International Foundation of Robotics Research .
5Saxena et al. [2014] A. Saxena, A. Jain, O. Sener, A. Jami, D. K. Misra, and H. S. Koppula. Robobrain: Large-scale knowledge engine for robots. ar Xiv preprint ar Xiv:1412.0691 , 2014.
6Daruna et al. [2019] A. Daruna, W. Liu, Z. Kira, and S. Chernova. Robocse: Robot common sense embedding. ar Xiv preprint ar Xiv:1903.00412 , 2019.
7Tenorth et al. [2010] M. Tenorth, L. Kunze, D. Jain, and M. Beetz. Knowrob-map-knowledge-linked semantic object maps. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on , pages 430–435. IEEE, 2010.
8Hsu et al. [2018] Y.-C. Hsu, Y.-C. Liu, and Z. Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. ar Xiv preprint ar Xiv:1810.12488 , 2018.