Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the   Embeddings of Words and Entities from Wikipedia

Ikuya Yamada; Akari Asai; Jin Sakuma; Hiroyuki Shindo; Hideaki Takeda,; Yoshiyasu Takefuji; and Yuji Matsumoto

arXiv:1812.06280·cs.CL·September 29, 2020·41 cites

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda,, Yoshiyasu Takefuji, and Yuji Matsumoto

PDF

Open Access

TL;DR

Wikipedia2Vec is an open-source toolkit that efficiently learns and visualizes word and entity embeddings from Wikipedia, achieving state-of-the-art results and supporting multiple languages.

Contribution

It introduces a user-friendly, efficient tool for learning and visualizing Wikipedia-based embeddings, with pretrained models and a web demo for exploration.

Findings

01

Achieved state-of-the-art results on KORE dataset

02

Demonstrated competitive performance on standard benchmarks

03

Supported multiple languages with pretrained embeddings

Abstract

The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Wikis in Education and Collaboration · Natural Language Processing Techniques