Extraction d'entit\'es dans des collections \'evolutives

Thierry Despeyroux (INRIA Rocquencourt / INRIA Sophia Antipolis),; Eduardo Fraschini (INRIA Rocquencourt / INRIA Sophia Antipolis); Anne-Marie; Vercoustre (INRIA Rocquencourt / INRIA Sophia Antipolis)

arXiv:0706.2797·cs.IR·September 29, 2009

Extraction d'entit\'es dans des collections \'evolutives

Thierry Despeyroux (INRIA Rocquencourt / INRIA Sophia Antipolis),, Eduardo Fraschini (INRIA Rocquencourt / INRIA Sophia Antipolis), Anne-Marie, Vercoustre (INRIA Rocquencourt / INRIA Sophia Antipolis)

PDF

Open Access

TL;DR

This paper presents a method for extracting named entities, specifically partner names, from evolving collections of reports using syntactic patterns and supervised learning, without relying on linguistic resources.

Contribution

It introduces an approach that combines pattern discovery and supervised validation for entity extraction in dynamic document collections, avoiding the need for extensive training data.

Findings

01

Extraction performance improves with larger training sets

02

Method does not require linguistic resources or large annotated datasets

03

Approach adapts to evolving document collections

Abstract

The goal of our work is to use a set of reports and extract named entities, in our case the names of Industrial or Academic partners. Starting with an initial list of entities, we use a first set of documents to identify syntactic patterns that are then validated in a supervised learning phase on a set of annotated documents. The complete collection is then explored. This approach is similar to the ones used in data extraction from semi-structured documents (wrappers) and do not need any linguistic resources neither a large set for training. As our collection of documents would evolve over years, we hope that the performance of the extraction would improve with the increased size of the training set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Natural Language Processing Techniques · Algorithms and Data Compression