Pattern Matching and Discourse Processing in Information Extraction from   Japanese Text

T. Kitani; Y. Eriguchi; M. Hara

arXiv:cs/9408102·cs.AI·February 3, 2008

Pattern Matching and Discourse Processing in Information Extraction from Japanese Text

T. Kitani, Y. Eriguchi, M. Hara

PDF

Open Access

TL;DR

This paper presents a Japanese information extraction system that combines pattern matching and discourse processing to accurately identify and link information in text, achieving performance close to humans.

Contribution

It introduces a novel system integrating pattern matching with discourse processing for Japanese text, enhancing information linking capabilities.

Findings

01

High system performance approaching human levels

02

Effective merging of information pieces using discourse processing

03

Successful application to Japanese text data

Abstract

Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Biomedical Text Mining and Ontologies