Using Context-to-Vector with Graph Retrofitting to Improve Word   Embeddings

Jiangbin Zheng; Yile Wang; Ge Wang; Jun Xia; Yufei Huang; Guojiang; Zhao; Yue Zhang; Stan Z. Li

arXiv:2210.16848·cs.CL·March 24, 2023

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang, Zhao, Yue Zhang, Stan Z. Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to enhance static word embeddings by integrating contextual information from pre-trained models and applying a retrofitting technique using synonym knowledge, resulting in improved performance.

Contribution

The paper proposes Context-to-Vec and a retrofitting approach to improve static embeddings with contextual info and post-processing, independent of training.

Findings

01

Outperforms baseline embeddings on multiple tasks

02

Effective integration of contextual information improves embedding quality

03

Retrofitting enhances static embeddings using synonym knowledge

Abstract

Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

binbinjiang/context2vector
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification