Leveraging Large Language Models for Generating Labeled Mineral Site Record Linkage Data
Jiyoon Pyo, Yao-Yi Chiang

TL;DR
This paper introduces a method that uses large language models to generate training data for record linkage of mineral site data, significantly improving accuracy and efficiency over traditional methods.
Contribution
The authors propose leveraging LLMs to generate training data for PLMs, reducing the need for costly ground-truth data and enhancing record linkage performance.
Findings
Over 45% improvement in F1 score compared to traditional PLM methods.
Inference time reduced by nearly 18 times compared to using LLMs directly.
Automated pipeline eliminates human intervention in data generation.
Abstract
Record linkage integrates diverse data sources by identifying records that refer to the same entity. In the context of mineral site records, accurate record linkage is crucial for identifying and mapping mineral deposits. Properly linking records that refer to the same mineral deposit helps define the spatial coverage of mineral areas, benefiting resource identification and site data archiving. Mineral site record linkage falls under the spatial record linkage category since the records contain information about the physical locations and non-spatial attributes in a tabular format. The task is particularly challenging due to the heterogeneity and vast scale of the data. While prior research employs pre-trained discriminative language models (PLMs) on spatial entity linkage, they often require substantial amounts of curated ground-truth data for fine-tuning. Gathering and creating ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
