MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Bo Li; Gexiang Fang; Wei Ye; Zhenghua Xu; Jinglei Zhang; Hao Cheng; Shikun Zhang

arXiv:2505.16107·cs.CL·May 23, 2025

MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Bo Li, Gexiang Fang, Wei Ye, Zhenghua Xu, Jinglei Zhang, Hao Cheng, Shikun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MPL, a framework that leverages multiple programming languages and a novel function-prompt technique to improve information extraction with large language models, demonstrating broad effectiveness.

Contribution

MPL explores the use of various programming languages in supervised fine-tuning and introduces function-prompt with virtual running to enhance IE tasks.

Findings

01

MPL outperforms existing methods across multiple datasets.

02

Using diverse programming languages improves extraction accuracy.

03

Function-prompt with virtual running enhances input simulation efficiency.

Abstract

Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural organization than natural languages (NLs). This structural advantage makes PLs particularly suited for IE tasks. Nevertheless, existing research primarily focuses on Python for code-style simulation, overlooking the potential of other widely-used PLs (e.g., C++ and Java) during the supervised fine-tuning (SFT) phase. In this research, we propose \textbf{M}ultiple \textbf{P}rogramming \textbf{L}anguages with large language models for information extraction (abbreviated as \textbf{MPL}), a novel framework that explores the potential of incorporating different PLs in the SFT phase. Additionally, we introduce \texttt{function-prompt} with virtual running to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-fgx/mpl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Machine Learning in Materials Science

MethodsShrink and Fine-Tune