Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO

Ran Li; Shimin Di; Yuchen Liu; Chen Jing; Yu Qiu; Lei Chen

arXiv:2505.22068·cs.CL·May 29, 2025

Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO

Ran Li, Shimin Di, Yuchen Liu, Chen Jing, Yu Qiu, Lei Chen

PDF

Open Access

TL;DR

This paper introduces MimicSFT and R$^2$GRPO, two novel training methods that significantly enhance the reasoning and memorization capabilities of large language models for scientific information extraction tasks.

Contribution

The paper proposes a two-stage training framework combining MimicSFT and R$^2$GRPO, which improves LLMs' reasoning capacity without requiring high-quality chain-of-thought data.

Findings

01

R$^2$GRPO with MimicSFT outperforms baseline LLMs in relation extraction.

02

Both methods improve reasoning capacity in scientific IE benchmarks.

03

The approach surpasses specialized supervised models in key tasks.

Abstract

Previous study suggest that powerful Large Language Models (LLMs) trained with Reinforcement Learning with Verifiable Rewards (RLVR) only refines reasoning path without improving the reasoning capacity in math tasks while supervised-finetuning(SFT) with distillation can. We study this from the view of Scientific information extraction (SciIE) where LLMs and reasoning LLMs underperforms small Bert-based models. SciIE require both the reasoning and memorization. We argue that both SFT and RLVR can refine the reasoning path and improve reasoning capacity in a simple way based on SciIE. We propose two-stage training with 1. MimicSFT, using structured reasoning templates without needing high-quality chain-of-thought data, 2. R $^{2}$ GRPO with relevance and rule-induced rewards. Experiments on scientific IE benchmarks show that both methods can improve the reasoning capacity. R $^{2}$ GRPO with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Multimodal Machine Learning Applications