WordRep: A Benchmark for Research on Learning Word Representations
Bin Gao, Jiang Bian, and Tie-Yan Liu

TL;DR
WordRep is a comprehensive benchmark dataset designed to evaluate and compare different methods of learning distributed word representations, facilitating deeper research and understanding in the field.
Contribution
This paper introduces WordRep, a new benchmark collection for evaluating word embeddings, detailing its construction, usage, and potential for advancing research.
Findings
Compared several state-of-the-art word representations
Reported evaluation performances on WordRep
Discussed new research directions enabled by WordRep
Abstract
WordRep is a benchmark collection for the research on learning distributed word representations (or word embeddings), released by Microsoft Research. In this paper, we describe the details of the WordRep collection and show how to use it in different types of machine learning research related to word embedding. Specifically, we describe how the evaluation tasks in WordRep are selected, how the data are sampled, and how the evaluation tool is built. We then compare several state-of-the-art word representations on WordRep, report their evaluation performance, and make discussions on the results. After that, we discuss new potential research topics that can be supported by WordRep, in addition to algorithm comparison. We hope that this paper can help people gain deeper understanding of WordRep, and enable more interesting research on learning distributed word representations and related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
