DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations
Nicholas Popovi\v{c}, Ashish Kangen, Tim Schopf, Michael F\"arber

TL;DR
This paper introduces a fully automated synthetic data generation pipeline combined with in-context learning for document-level entity and relation extraction, reducing the need for manual annotation and improving zero-shot extraction capabilities.
Contribution
It presents a novel approach that integrates synthetic data creation with retrieval-based in-context learning using LLMs, enhancing document-level information extraction without manual labels.
Findings
Synthetic dataset of 59k entities and 30k relations created.
In-context learning performs challenging document-level extraction tasks.
State-of-the-art LLMs still face difficulties in zero-shot document extraction.
Abstract
Large, high-quality annotated corpora remain scarce in document-level entity and relation extraction in zero-shot or few-shot settings. In this paper, we present a fully automatic, LLM-based pipeline for synthetic data generation and in-context learning for document-level entity and relation extraction. In contrast to existing approaches that rely on manually annotated demonstrations or direct zero-shot inference, our method combines synthetic data generation with retrieval-based in-context learning, using a reasoning-optimized language model. This allows us to build a high-quality demonstration database without manual annotation and to dynamically retrieve relevant examples at inference time. Based on our approach we produce a synthetic dataset of over Wikipedia abstracts with approximately entities and relation triples. Finally, we evaluate in-context learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
