Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji,, Jiawei Han

TL;DR
This paper introduces a new instruction-tuning paradigm for personalized, on-demand information extraction that allows users to specify extraction tasks and formats, supported by a new benchmark and a specialized model.
Contribution
It proposes the On-Demand Information Extraction paradigm, creates the InstructIE benchmark, and develops the ODIE model, advancing personalized extraction capabilities in NLP.
Findings
ODIE outperforms existing open-source models of similar size.
The benchmark facilitates research in personalized information extraction.
The approach enables flexible, user-specified extraction tasks.
Abstract
Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Web Data Mining and Analysis · Data Quality and Management
MethodsHigh-Order Consensuses · ALIGN
