Extending and Improving Wordnet via Unsupervised Word Embeddings

Mikhail Khodak; Andrej Risteski; Christiane Fellbaum; Sanjeev Arora

arXiv:1705.00217·cs.CL·May 2, 2017·2 cites

Extending and Improving Wordnet via Unsupervised Word Embeddings

Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

PDF

Open Access

TL;DR

This paper introduces an unsupervised method leveraging distributional semantics to enhance WordNet, successfully constructing improved lexical databases for French and Russian with minimal resources.

Contribution

It presents a novel unsupervised approach for building and improving WordNet in low-resource languages using word embeddings, outperforming existing automated WordNets.

Findings

01

Significant increase in synset recall on new test sets

02

Outperforms existing automated WordNets in F-score

03

Applicable to low-resource languages and sense clustering

Abstract

This work presents an unsupervised approach for improving WordNet that builds upon recent advances in document and sense representation via distributional semantics. We apply our methods to construct Wordnets in French and Russian, languages which both lack good manual constructions.1 These are evaluated on two new 600-word test sets for word-to-synset matching and found to improve greatly upon synset recall, outperforming the best automated Wordnets in F-score. Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies