Information Extraction from Historical Well Records Using A Large   Language Model

Zhiwei Ma; Javier E. Santo; Greg Lackey; Hari Viswanathan; Daniel; O'Malley

arXiv:2405.05438·cs.IR·May 10, 2024·1 cites

Information Extraction from Historical Well Records Using A Large Language Model

Zhiwei Ma, Javier E. Santo, Greg Lackey, Hari Viswanathan, Daniel, O'Malley

PDF

Open Access

TL;DR

This paper presents a workflow using large language models to extract critical well location and depth information from historical records, significantly improving efficiency over manual methods, especially for structured PDF reports.

Contribution

The study introduces a novel LLM-based information extraction workflow tailored for historical well records, demonstrating high accuracy and highlighting the potential of LLMs in geoscientific data recovery.

Findings

01

100% accuracy on clean PDF reports

02

70% accuracy on unstructured image-based records

03

More detailed prompts improve extraction performance

Abstract

To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Although some historical documents are available, they are often unstructured, not cleaned, and outdated. Additionally, they vary widely by state and type. Manual reading and digitizing this information from historical documents are not feasible, given the high number of wells. Here, we propose a new computational approach for rapidly and cost-effectively locating these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test them on a dataset of 160 well documents. Our results show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management

MethodsLLaMA