Adapting predominant and novel sense discovery algorithms for   identifying corpus-specific sense differences

Binny Mathew; Suman Kalyan Maity; Pratip Sarkar; Animesh Mukherjee and; Pawan Goyal

arXiv:1802.00231·cs.CL·February 2, 2018

Adapting predominant and novel sense discovery algorithms for identifying corpus-specific sense differences

Binny Mathew, Suman Kalyan Maity, Pratip Sarkar, Animesh Mukherjee and, Pawan Goyal

PDF

TL;DR

This paper adapts existing sense discovery algorithms to identify corpus-specific word senses across different sources and time points, enhancing the understanding of sense variation in large textual datasets.

Contribution

It introduces automated methods for adapting sense discovery algorithms to corpus-specific contexts and evaluates their effectiveness on large digitized corpora.

Findings

01

45-60% of identified senses are judged as genuine

02

Algorithms perform comparably after adaptation

03

Methods work across different data sources and time points

Abstract

Word senses are not static and may have temporal, spatial or corpus-specific scopes. Identifying such scopes might benefit the existing WSD systems largely. In this paper, while studying corpus specific word senses, we adapt three existing predominant and novel-sense discovery algorithms to identify these corpus-specific senses. We make use of text data available in the form of millions of digitized books and newspaper archives as two different sources of corpora and propose automated methods to identify corpus-specific word senses at various time points. We conduct an extensive and thorough human judgment experiment to rigorously evaluate and compare the performance of these approaches. Post adaptation, the output of the three algorithms are in the same format and the accuracy results are also comparable, with roughly 45-60% of the reported corpus-specific senses being judged as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.