Thematic Annotation: extracting concepts out of documents

Pierre Andrews; Martin Rajman

arXiv:cs/0412117·cs.CL·May 23, 2007

Thematic Annotation: extracting concepts out of documents

Pierre Andrews, Martin Rajman

PDF

Open Access

TL;DR

This paper introduces a novel thematic annotation method that leverages a large semantic database to extract relevant concepts from documents without relying on keyword extraction, focusing on hierarchical concept relations.

Contribution

The work presents a new concept-based annotation algorithm that uses a semantic hierarchy to represent document content, differing from traditional keyword or statistical methods.

Findings

01

Effective extraction of relevant concepts not explicitly present in the text

02

Utilizes a semantic hierarchy to capture document themes

03

Provides a synthetic, concept-based document representation

Abstract

Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies