Not As Easy As It Seems: Automating the Construction of Lexical Chains Using Roget's Thesaurus
Mario Jarmasz, Stan Szpakowicz

TL;DR
This paper explores automating the creation of lexical chains using an electronic version of Roget's Thesaurus, aiming to improve text cohesion analysis for NLP tasks like summarization.
Contribution
It introduces a method for building lexical chains with Roget's Thesaurus and compares it to existing implementations, filling a gap in automated lexical chain construction.
Findings
Successful implementation of lexical chains using Roget's Thesaurus
Comparison shows differences with WordNet-based methods
Highlights challenges in automating lexical chain construction
Abstract
Morris and Hirst present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text summarization. The first lexical chains were constructed manually using Roget's International Thesaurus. Morris and Hirst wrote that automation would be straightforward given an electronic thesaurus. All applications so far have used WordNet to produce lexical chains, perhaps because adequate electronic versions of Roget's were not available until recently. We discuss the building of lexical chains using an electronic version of Roget's Thesaurus. We implement a variant of the original algorithm, and explain the necessary design decisions. We include a comparison with other implementations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
