Information Extraction from Historical Well Records Using A Large Language Model
Zhiwei Ma, Javier E. Santo, Greg Lackey, Hari Viswanathan, Daniel, O'Malley

TL;DR
This paper presents a workflow using large language models to extract critical well location and depth information from historical records, significantly improving efficiency over manual methods, especially for structured PDF reports.
Contribution
The study introduces a novel LLM-based information extraction workflow tailored for historical well records, demonstrating high accuracy and highlighting the potential of LLMs in geoscientific data recovery.
Findings
100% accuracy on clean PDF reports
70% accuracy on unstructured image-based records
More detailed prompts improve extraction performance
Abstract
To reduce environmental risks and impacts from orphaned wells (abandoned oil and gas wells), it is essential to first locate and then plug these wells. Although some historical documents are available, they are often unstructured, not cleaned, and outdated. Additionally, they vary widely by state and type. Manual reading and digitizing this information from historical documents are not feasible, given the high number of wells. Here, we propose a new computational approach for rapidly and cost-effectively locating these wells. Specifically, we leverage the advanced capabilities of large language models (LLMs) to extract vital information including well location and depth from historical records of orphaned wells. In this paper, we present an information extraction workflow based on open-source Llama 2 models and test them on a dataset of 160 well documents. Our results show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
MethodsLLaMA
