Context Matters: Recovering Human Semantic Structure from Machine   Learning Analysis of Large-Scale Text Corpora

Marius C\u{a}t\u{a}lin Iordan; Tyler Giallanza; Cameron T. Ellis,; Nicole M. Beckage; Jonathan D. Cohen

arXiv:1910.06954·cs.CL·July 17, 2020

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora

Marius C\u{a}t\u{a}lin Iordan, Tyler Giallanza, Cameron T. Ellis,, Nicole M. Beckage, Jonathan D. Cohen

PDF

TL;DR

This paper introduces a context-aware embedding approach that significantly enhances the alignment between machine-generated semantic representations and human judgments, advancing understanding of human semantic structure from large-scale text data.

Contribution

The authors propose a novel contextually-constrained training method for embeddings that better captures human semantic judgments, addressing previous discrepancies.

Findings

01

Improved prediction of similarity judgments

02

Enhanced feature rating accuracy

03

Better alignment with empirical human data

Abstract

Applying machine learning algorithms to large-scale, text-based corpora (embeddings) presents a unique opportunity to investigate at scale how human semantic knowledge is organized and how people use it to judge fundamental relationships, such as similarity between concepts. However, efforts to date have shown a substantial discrepancy between algorithm predictions and empirical judgments. Here, we introduce a novel approach of generating embeddings motivated by the psychological theory that semantic context plays a critical role in human judgments. Specifically, we train state-of-the-art machine learning algorithms using contextually-constrained text corpora and show that this greatly improves predictions of similarity judgments and feature ratings. By improving the correspondence between representations derived using embeddings generated by machine learning methods and empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.