A Survey of Document-Level Information Extraction
Hanwen Zheng, Sijia Wang, Lifu Huang

TL;DR
This survey reviews recent advances in document-level information extraction, analyzes errors and limitations of current methods, and highlights key challenges like noise and reasoning for future research.
Contribution
It provides a comprehensive overview of recent literature, detailed error analysis, and identifies critical challenges in document-level IE to guide future improvements.
Findings
Labeling noise significantly impacts performance
Entity coreference resolution remains challenging
Lack of reasoning capabilities limits current models
Abstract
Document-level information extraction (IE) is a crucial task in natural language processing (NLP). This paper conducts a systematic review of recent document-level IE literature. In addition, we conduct a thorough error analysis with current state-of-the-art algorithms and identify their limitations as well as the remaining challenges for the task of document-level IE. According to our findings, labeling noises, entity coreference resolution, and lack of reasoning, severely affect the performance of document-level IE. The objective of this survey paper is to provide more insights and help NLP researchers to further enhance document-level IE performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
