TL;DR
This paper presents HM-SGE, a hierarchical graph embedding model that learns grounded word meanings across modalities, effectively capturing similarity relations and outperforming existing methods in simulating human judgments.
Contribution
The novel HM-SGE model integrates modality-specific and joint graph embeddings to improve grounded word meaning representations.
Findings
Outperforms state-of-the-art in simulating human similarity judgments
Effectively models concept categorization
Validates the hierarchical multi-modal graph approach
Abstract
This paper introduces a novel approach to learn visually grounded meaning representations of words as low-dimensional node embeddings on an underlying graph hierarchy. The lower level of the hierarchy models modality-specific word representations through dedicated but communicating graphs, while the higher level puts these representations together on a single graph to learn a representation jointly from both modalities. The topology of each graph models similarity relations among words, and is estimated jointly with the graph embedding. The assumption underlying this model is that words sharing similar meaning correspond to communities in an underlying similarity graph in a low-dimensional space. We named this model Hierarchical Multi-Modal Similarity Graph Embedding (HM-SGE). Experimental results validate the ability of HM-SGE to simulate human similarity judgements and concept…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
