Croppable Knowledge Graph Embedding
Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

TL;DR
This paper introduces MED, a novel training framework for knowledge graph embeddings that enables creating multiple sub-models of different dimensions from a single training, enhancing efficiency and flexibility in AI applications.
Contribution
MED allows for a single training process to produce croppable sub-models of various dimensions, reducing costs and increasing adaptability in knowledge graph embedding tasks.
Findings
MED improves low-dimensional sub-model performance.
High-dimensional sub-models retain low-dimensional capacity.
Framework demonstrates effectiveness across multiple datasets and models.
Abstract
Knowledge Graph Embedding (KGE) is a common approach for Knowledge Graphs (KGs) in AI tasks. Embedding dimensions depend on application scenarios. Requiring a new dimension means training a new KGE model from scratch, increasing cost and limiting efficiency and flexibility. In this work, we propose a novel KGE training framework MED. It allows one training to obtain a croppable KGE model for multiple scenarios with different dimensional needs. Sub-models of required dimensions can be directly cropped and used without extra training. In MED, we propose a mutual learning mechanism to improve the low-dimensional sub-models and make high-dimensional sub-models retain the low-dimensional sub-models' capacity, an evolutionary improvement mechanism to promote the high-dimensional sub-models to master the triple that the low-dimensional sub-models can not, and a dynamic loss weight to…
Peer Reviews
Decision·Submitted to ICLR 2025
– The core idea and motivation of the paper are sound, and such approaches are essential to support real-world applications. – The experiments have been done across a wide range of datasets from smaller ones to large one (SKG). In addition, the authors show their approach is general and can be extended to other machine learning models such as BERT. Thus the model may have a high impact beyond KGE models. – In very low dimension, e.g., 10d the method shows superior performance comparing to othe
– In high dimension, the results are not better than other models in most cases. – The technical contribution of the paper is not very significant. The main loss is combination of two existing losses. Moreover, the equation 4 is the same as equation 5 in the RotatE paper (the only difference is that equation 4 is used for two model, please add citation).
1. The proposed method aims to train once to get a croppable KGE model applicable to multiple scenarios with different dimensional requirements, which is an interesting topic. 2. The authors improve the low-dimensional sub-model's performance and make the high-dimensional sub-models retain the capacity that low-dimensional sub-models have, which seems reasonable.
1. The paper is not organized clearly, which is not friendly for understanding. For example, there is a lack of preliminary details on how the previous knowledge distillation methods do. 2. The novelty of this paper seems limited since knowledge distillation has already been used in the previous work [1]. [1] Lifelong embedding learning and transfer for growing knowledge graphs 3. The paper lacks the analysis of time complexity as well as space complexity, which is necessary to study the effic
Strengths: 1. This paper presents an interesting problem that is how to train a croppable KGE so that more different dimensions can be cropped from the embeddings. The whole idea is clear and easy to follow in this paper. 2. A framework MED is proposed for serving the purpose of croppable embeddings. This framework is consisting of multiple sub-models. The low dimensional models are similar to our original KGE. Authors want to improve the performance as much as possible. The high dimensional m
Weaknesses: 1. The ablation studies are needed for this paper. The MED includes three modules: mutual learning mechanism, evolutionary improvement mechanism and a dynamic loss weight. It is very important to evaluate the effectiveness of each module and discuss if there is any alternative solution here. For instance, can we just duplicate a model that dimension d is small num n times and use the evolutionary improvement mechanism to tune them for satisfying the target that high-dimensional mode
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Residual Connection · Multi-Head Attention · WordPiece · Softmax · Layer Normalization · Attention Dropout
