Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model
Ruixuan Luo

TL;DR
This paper introduces an Image-Enhanced Skip-Gram Model that learns grounded word embeddings by aligning them with image vectors, providing more interpretable and visually explainable word representations.
Contribution
The paper proposes a novel model that integrates image vectors with word embeddings, enhancing interpretability and grounding words in visual context.
Findings
High correlation between image vectors and word embeddings
Embeddings provide vivid image-based explanations
Model improves interpretability of word representations
Abstract
Word embedding is designed to represent the semantic meaning of a word with low dimensional vectors. The state-of-the-art methods of learning word embeddings (word2vec and GloVe) only use the word co-occurrence information. The learned embeddings are real number vectors, which are obscure to human. In this paper, we propose an Image-Enhanced Skip-Gram Model to learn grounded word embeddings by representing the word vectors in the same hyper-plane with image vectors. Experiments show that the image vectors and word embeddings learned by our model are highly correlated, which indicates that our model is able to provide a vivid image-based explanation to the word embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
