WordRep: A Benchmark for Research on Learning Word Representations

Bin Gao; Jiang Bian; and Tie-Yan Liu

arXiv:1407.1640·cs.CL·July 8, 2014·30 cites

WordRep: A Benchmark for Research on Learning Word Representations

Bin Gao, Jiang Bian, and Tie-Yan Liu

PDF

Open Access

TL;DR

WordRep is a comprehensive benchmark dataset designed to evaluate and compare different methods of learning distributed word representations, facilitating deeper research and understanding in the field.

Contribution

This paper introduces WordRep, a new benchmark collection for evaluating word embeddings, detailing its construction, usage, and potential for advancing research.

Findings

01

Compared several state-of-the-art word representations

02

Reported evaluation performances on WordRep

03

Discussed new research directions enabled by WordRep

Abstract

WordRep is a benchmark collection for the research on learning distributed word representations (or word embeddings), released by Microsoft Research. In this paper, we describe the details of the WordRep collection and show how to use it in different types of machine learning research related to word embedding. Specifically, we describe how the evaluation tasks in WordRep are selected, how the data are sampled, and how the evaluation tool is built. We then compare several state-of-the-art word representations on WordRep, report their evaluation performance, and make discussions on the results. After that, we discuss new potential research topics that can be supported by WordRep, in addition to algorithm comparison. We hope that this paper can help people gain deeper understanding of WordRep, and enable more interesting research on learning distributed word representations and related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification