MorphTE: Injecting Morphology in Tensorized Embeddings

Guobing Gan; Peng Zhang; Sunzhu Li; Xiuqing Lu; Benyou Wang

arXiv:2210.15379·cs.CL·October 28, 2022

MorphTE: Injecting Morphology in Tensorized Embeddings

Guobing Gan, Peng Zhang, Sunzhu Li, Xiuqing Lu, Benyou Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

MorphTE introduces a novel method for compressing word embeddings by leveraging morphological structures and tensor products, significantly reducing storage needs while maintaining performance across language tasks.

Contribution

The paper presents MorphTE, a new embedding compression technique that incorporates morphological information through tensorized representations, improving efficiency without sacrificing accuracy.

Findings

01

Achieves approximately 20x compression of word embeddings.

02

Maintains comparable performance on machine translation and question answering tasks.

03

Outperforms existing embedding compression methods.

Abstract

In the era of deep learning, word embeddings are essential when dealing with text tasks. However, storing and accessing these embeddings requires a large amount of space. This is not conducive to the deployment of these models on resource-limited devices. Combining the powerful compression capability of tensor products, we propose a word embedding compression method with morphological augmentation, Morphologically-enhanced Tensorized Embeddings (MorphTE). A word consists of one or more morphemes, the smallest units that bear meaning or have a grammatical function. MorphTE represents a word embedding as an entangled form of its morpheme vectors via the tensor product, which injects prior semantic and grammatical knowledge into the learning of embeddings. Furthermore, the dimensionality of the morpheme vector and the number of morphemes are much smaller than those of words, which greatly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigganbing/Fairseq_MorphTE
pytorchOfficial

Videos

MorphTE: Injecting Morphology in Tensorized Embeddings· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis