SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema

Yushen Fang; Jianjun Li; Mingqian Ding; Chang Liu; Xinchi Zou; Wenqi Yang

arXiv:2512.12337·cs.CL·December 16, 2025

SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema

Yushen Fang, Jianjun Li, Mingqian Ding, Chang Liu, Xinchi Zou, Wenqi Yang

PDF

Open Access

TL;DR

The paper introduces SCIR, a self-correcting iterative framework for information extraction that improves accuracy and reduces training costs by leveraging feedback-driven optimization and a large bilingual dataset.

Contribution

It presents a novel universal IE paradigm with a self-correcting framework and a large bilingual dataset, enhancing flexibility, accuracy, and cost-efficiency of LLM-based IE systems.

Findings

01

SCIR outperforms state-of-the-art IE methods in key tasks.

02

Achieves 5.27% average improvement in span-based Micro-F1.

03

Reduces training costs by 87%.

Abstract

Although Large language Model (LLM)-powered information extraction (IE) systems have shown impressive capabilities, current fine-tuning paradigms face two major limitations: high training costs and difficulties in aligning with LLM preferences. To address these issues, we propose a novel universal IE paradigm, the Self-Correcting Iterative Refinement (SCIR) framework, along with a Multi-task Bilingual (Chinese-English) Self-Correcting (MBSC) dataset containing over 100,000 entries. The SCIR framework achieves plug-and-play compatibility with existing LLMs and IE systems through its Dual-Path Self-Correcting module and feedback-driven optimization, thereby significantly reducing training costs. Concurrently, the MBSC dataset tackles the challenge of preference alignment by indirectly distilling GPT-4's capabilities into IE result detection models. Experimental results demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks