Enriching very large ontologies using the WWW
Eneko Agirre, Olatz Ansa, Eduard Hovy, David Martinez

TL;DR
This paper presents a method to enrich large ontologies like WordNet by leveraging web data to create topical signatures and hierarchical clusters, enhancing concept relationships and sense disambiguation.
Contribution
It introduces a novel approach to use web documents for enriching ontologies, addressing WordNet's limitations in topical links and sense proliferation.
Findings
Topic signatures improve word sense disambiguation
Hierarchical clusters enhance disambiguation accuracy
Web-based enrichment effectively expands ontology structure
Abstract
This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of topically related words) for each concept in WordNet, and 2) to build hierarchical clusters of the concepts (the word senses) that lexicalize a given word. The overall goal is to overcome two shortcomings of WordNet: the lack of topical links among concepts, and the proliferation of senses. Topic signatures are validated on a word sense disambiguation task with good results, which are improved when the hierarchical clusters are used.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
