Preserved Structure Across Vector Space Representations
Andrei Amatuni, Estelle He, Elika Bergelson

TL;DR
This study demonstrates that similarity structures of concepts are preserved across image and word vector spaces, potentially aiding language learning by reflecting stable category boundaries without human ratings.
Contribution
It shows that image- and word-based similarity structures are correlated and overlap significantly, revealing invariant conceptual organization across different representational formats.
Findings
Image- and word-space similarities are correlated.
Neighbors of items overlap across spaces.
Items with overlapping neighbors are learned earlier by infants.
Abstract
Certain concepts, words, and images are intuitively more similar than others (dog vs. cat, dog vs. spoon), though quantifying such similarity is notoriously difficult. Indeed, this kind of computation is likely a critical part of learning the category boundaries for words within a given language. Here, we use a set of 27 items (e.g. 'dog') that are highly common in infants' input, and use both image- and word-based algorithms to independently compute similarity among them. We find three key results. First, the pairwise item similarities derived within image-space and word-space are correlated, suggesting preserved structure among these extremely different representational formats. Second, the closest 'neighbors' for each item, within each space, showed significant overlap (e.g. both found 'egg' as a neighbor of 'apple'). Third, items with the most overlapping neighbors are later-learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Machine Learning in Bioinformatics · Child and Animal Learning Development
