The Effects of Data Size and Frequency Range on Distributional Semantic   Models

Magnus Sahlgren; Alessandro Lenci

arXiv:1609.08293·cs.CL·September 28, 2016

The Effects of Data Size and Frequency Range on Distributional Semantic Models

Magnus Sahlgren, Alessandro Lenci

PDF

Open Access

TL;DR

This study examines how data size and frequency range influence distributional semantic models, revealing that neural models struggle with small data and that the inverted factorized model is most reliable across conditions.

Contribution

It provides a comparative analysis of different semantic models under varying data sizes and frequency ranges, highlighting the robustness of the inverted factorized model.

Findings

01

Neural network models underperform with small datasets.

02

The inverted factorized model is most reliable across different data sizes.

03

Model performance varies significantly with data size and frequency range.

Abstract

This paper investigates the effects of data size and frequency range on distributional semantic models. We compare the performance of a number of representative models for several test settings over data of varying sizes, and over test items of various frequency. Our results show that neural network-based models underperform when the data is small, and that the most reliable model over data of varying sizes and frequency ranges is the inverted factorized model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare