Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge
Steven Derby, Paul Miller, Brian Murphy, Barry Devereux

TL;DR
This paper introduces a method to create sparse, interpretable semantic embeddings from combined text and image data, aligning computational models more closely with human conceptual understanding.
Contribution
It proposes a novel joint sparse embedding technique that integrates multimodal information to better model human-like semantics.
Findings
Sparse embeddings predict human semantic judgments.
Models align with neuroimaging data.
Enhanced interpretability of semantic representations.
Abstract
Distributional models provide a convenient way to model semantics using dense embedding spaces derived from unsupervised learning algorithms. However, the dimensions of dense embedding spaces are not designed to resemble human semantic knowledge. Moreover, embeddings are often built from a single source of information (typically text data), even though neurocognitive research suggests that semantics is deeply linked to both language and perception. In this paper, we combine multimodal information from both text and image-based representations derived from state-of-the-art distributional models to produce sparse, interpretable vectors using Joint Non-Negative Sparse Embedding. Through in-depth analyses comparing these sparse models to human-derived behavioural and neuroimaging data, we demonstrate their ability to predict interpretable linguistic descriptions of human ground-truth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
