A Survey of Document-Level Information Extraction

Hanwen Zheng; Sijia Wang; Lifu Huang

arXiv:2309.13249·cs.CL·September 26, 2023

A Survey of Document-Level Information Extraction

Hanwen Zheng, Sijia Wang, Lifu Huang

PDF

Open Access

TL;DR

This survey reviews recent advances in document-level information extraction, analyzes errors and limitations of current methods, and highlights key challenges like noise and reasoning for future research.

Contribution

It provides a comprehensive overview of recent literature, detailed error analysis, and identifies critical challenges in document-level IE to guide future improvements.

Findings

01

Labeling noise significantly impacts performance

02

Entity coreference resolution remains challenging

03

Lack of reasoning capabilities limits current models

Abstract

Document-level information extraction (IE) is a crucial task in natural language processing (NLP). This paper conducts a systematic review of recent document-level IE literature. In addition, we conduct a thorough error analysis with current state-of-the-art algorithms and identify their limitations as well as the remaining challenges for the task of document-level IE. According to our findings, labeling noises, entity coreference resolution, and lack of reasoning, severely affect the performance of document-level IE. The objective of this survey paper is to provide more insights and help NLP researchers to further enhance document-level IE performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques