From Textual Information Sources to Linked Data in the Agatha Project

Paulo Quaresma; Vitor Beires Nogueira; Kashyap Raiyani; Roy Bayot; and; Teresa Gon\c{c}alves

arXiv:1909.05359·cs.CL·September 13, 2019

From Textual Information Sources to Linked Data in the Agatha Project

Paulo Quaresma, Vitor Beires Nogueira, Kashyap Raiyani, Roy Bayot, and, Teresa Gon\c{c}alves

PDF

TL;DR

This paper presents a pipeline for converting Portuguese textual documents into linked data using ontologies and NLP techniques, enabling reasoning in the domain of criminal investigations.

Contribution

It introduces a language-independent architecture that combines NLP modules with ontologies to represent and reason about textual data in Portuguese.

Findings

01

Successful ontology population from Portuguese texts

02

Language-independent NLP pipeline architecture

03

Potential for reasoning in criminal investigation domain

Abstract

Automatic reasoning about textual information is a challenging task in modern Natural Language Processing (NLP) systems. In this work we describe our proposal for representing and reasoning about Portuguese documents by means of Linked Data like ontologies and thesauri. Our approach resorts to a specialized pipeline of natural language processing (part-of-speech tagger, named entity recognition, semantic role labeling) to populate an ontology for the domain of criminal investigations. The provided architecture and ontology are language independent. Although some of the NLP modules are language dependent, they can be built using adequate AI methodologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.