Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora
Marius C\u{a}t\u{a}lin Iordan, Tyler Giallanza, Cameron T. Ellis,, Nicole M. Beckage, Jonathan D. Cohen

TL;DR
This paper introduces a context-aware embedding approach that significantly enhances the alignment between machine-generated semantic representations and human judgments, advancing understanding of human semantic structure from large-scale text data.
Contribution
The authors propose a novel contextually-constrained training method for embeddings that better captures human semantic judgments, addressing previous discrepancies.
Findings
Improved prediction of similarity judgments
Enhanced feature rating accuracy
Better alignment with empirical human data
Abstract
Applying machine learning algorithms to large-scale, text-based corpora (embeddings) presents a unique opportunity to investigate at scale how human semantic knowledge is organized and how people use it to judge fundamental relationships, such as similarity between concepts. However, efforts to date have shown a substantial discrepancy between algorithm predictions and empirical judgments. Here, we introduce a novel approach of generating embeddings motivated by the psychological theory that semantic context plays a critical role in human judgments. Specifically, we train state-of-the-art machine learning algorithms using contextually-constrained text corpora and show that this greatly improves predictions of similarity judgments and feature ratings. By improving the correspondence between representations derived using embeddings generated by machine learning methods and empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
