Fighting with the Sparsity of Synonymy Dictionaries

Dmitry Ustalov; Mikhail Chernoskutov; Chris Biemann; and Alexander; Panchenko

arXiv:1708.09234·cs.CL·May 21, 2018

Fighting with the Sparsity of Synonymy Dictionaries

Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, and Alexander, Panchenko

PDF

TL;DR

This paper addresses the challenge of sparse synonymy dictionaries in graph-based synset induction by proposing pre- and post-processing methods to improve synset quality, evaluated on Russian datasets.

Contribution

It introduces two novel approaches—graph pre-processing and cluster merging—to mitigate dictionary sparsity effects in synset induction methods.

Findings

01

Pre-processing with missing edge addition improves synset quality.

02

Post-processing by merging similar clusters enhances results.

03

Both methods significantly outperform baseline approaches.

Abstract

Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph. However, such methods are sensitive to the structure of the input synonymy graph: sparseness of the input dictionary can substantially reduce the quality of the extracted synsets. In this paper, we propose two different approaches designed to alleviate the incompleteness of the input dictionaries. The first one performs a pre-processing of the graph by adding missing edges, while the second one performs a post-processing by merging similar synset clusters. We evaluate these approaches on two datasets for the Russian language and discuss their impact on the performance of synset induction methods. Finally, we perform an extensive error analysis of each approach and discuss prominent alternative methods for coping with the problem of the sparsity of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.