ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
Andrei M. Butnaru, Radu Tudor Ionescu, Florentina Hristea

TL;DR
This paper introduces ShotgunWSD, an unsupervised, DNA-inspired algorithm for document-level word sense disambiguation that outperforms existing methods with minimal parameters and deterministic results.
Contribution
The paper presents a novel DNA-inspired unsupervised WSD algorithm that improves accuracy and robustness over state-of-the-art methods, with fewer parameters and deterministic output.
Findings
Outperforms other unsupervised WSD algorithms significantly
Can surpass the Most Common Sense baseline on some datasets
Has a small number of parameters and is robust to tuning
Abstract
In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
