Pattern Matching and Discourse Processing in Information Extraction from Japanese Text
T. Kitani, Y. Eriguchi, M. Hara

TL;DR
This paper presents a Japanese information extraction system that combines pattern matching and discourse processing to accurately identify and link information in text, achieving performance close to humans.
Contribution
It introduces a novel system integrating pattern matching with discourse processing for Japanese text, enhancing information linking capabilities.
Findings
High system performance approaching human levels
Effective merging of information pieces using discourse processing
Successful application to Japanese text data
Abstract
Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
