Leveraging Large Language Models for Generating Labeled Mineral Site   Record Linkage Data

Jiyoon Pyo; Yao-Yi Chiang

arXiv:2412.03575·cs.IR·December 6, 2024

Leveraging Large Language Models for Generating Labeled Mineral Site Record Linkage Data

Jiyoon Pyo, Yao-Yi Chiang

PDF

TL;DR

This paper introduces a method that uses large language models to generate training data for record linkage of mineral site data, significantly improving accuracy and efficiency over traditional methods.

Contribution

The authors propose leveraging LLMs to generate training data for PLMs, reducing the need for costly ground-truth data and enhancing record linkage performance.

Findings

01

Over 45% improvement in F1 score compared to traditional PLM methods.

02

Inference time reduced by nearly 18 times compared to using LLMs directly.

03

Automated pipeline eliminates human intervention in data generation.

Abstract

Record linkage integrates diverse data sources by identifying records that refer to the same entity. In the context of mineral site records, accurate record linkage is crucial for identifying and mapping mineral deposits. Properly linking records that refer to the same mineral deposit helps define the spatial coverage of mineral areas, benefiting resource identification and site data archiving. Mineral site record linkage falls under the spatial record linkage category since the records contain information about the physical locations and non-spatial attributes in a tabular format. The task is particularly challenging due to the heterogeneity and vast scale of the data. While prior research employs pre-trained discriminative language models (PLMs) on spatial entity linkage, they often require substantial amounts of curated ground-truth data for fine-tuning. Gathering and creating ground…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.