KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction
Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su,, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan, Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

TL;DR
KnowCoder introduces a unified code-style schema representation and a two-phase learning framework for large language models to improve universal information extraction, demonstrating significant gains in few-shot, zero-shot, and supervised settings.
Contribution
The paper presents a novel code-style schema representation and a large, comprehensive schema library, along with a two-phase training framework, enabling LLMs to better understand and follow schemas for UIE.
Findings
Achieves 49.8% relative F1 improvement over LLaMA2 in few-shot settings.
Attains up to 12.5% and 21.9% improvements over SOTA in zero-shot and low-resource settings.
Enhances performance by 7.5% in supervised learning with human-annotated datasets.
Abstract
In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes, with which complex schema information, such as constraints among tasks in UIE, can be captured in an LLM-friendly manner. We further construct a code-style schema library covering over types of knowledge, which is the largest one for UIE, to the best of our knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase learning framework that enhances its schema…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsLib
