Retrieval-Augmented Code Generation for Universal Information Extraction
Yucan Guo, Zixuan Li, Xiaolong Jin, Yantao Liu, Yutao Zeng, Wenxuan, Liu, Xiang Li, Pan Yang, Long Bai, Jiafeng Guo, Xueqi Cheng

TL;DR
This paper introduces Code4UIE, a universal framework using retrieval-augmented code generation with LLMs to perform various information extraction tasks by transforming text into structured code representations.
Contribution
The paper proposes a novel universal code generation framework for IE that leverages Python schemas and retrieval strategies, enhancing flexibility and accuracy across multiple tasks.
Findings
Effective across five IE tasks and nine datasets
Improves extraction accuracy with retrieval-augmented code generation
Demonstrates versatility in handling diverse IE schemas
Abstract
Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code, as a typical kind of formalized language, is capable of describing structural knowledge under various schemas in a universal way. On the other hand, Large Language Models (LLMs) trained on both codes and texts have demonstrated powerful capabilities of transforming texts into codes, which provides a feasible solution to IE tasks. Therefore, in this paper, we propose a universal retrieval-augmented code generation framework based on LLMs, called Code4UIE, for IE tasks. Specifically, Code4UIE adopts Python classes to define task-specific schemas of various structural knowledge in a universal way. By so doing, extracting knowledge under these schemas can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
