Context and Keyword Extraction in Plain Text Using a Graph   Representation

Carlo Abi Chahine (LITIS); Nathalie Chaignaud (LITIS); Jean-Philippe; Kotowicz (LITIS); Jean-Pierre P\'ecuchet (LITIS)

arXiv:0912.1421·cs.IR·December 9, 2009

Context and Keyword Extraction in Plain Text Using a Graph Representation

Carlo Abi Chahine (LITIS), Nathalie Chaignaud (LITIS), Jean-Philippe, Kotowicz (LITIS), Jean-Pierre P\'ecuchet (LITIS)

PDF

TL;DR

This paper introduces a novel indexing support system that uses ontology and graph representations to extract contextualized keywords from plain text documents, aiding archivists in document indexing.

Contribution

It presents an innovative method leveraging ontologies and graph structures to improve keyword extraction and document indexing support.

Findings

01

Effective extraction of contextualized keywords demonstrated

02

Utilized Wikipedia's category links as a resource

03

Enhanced indexing support for specialized documents

Abstract

Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.