Classification of Geological Borehole Descriptions Using a Domain Adapted Large Language Model
Hossein Ghorbanfekr, Pieter Jan Kerstens, Katrijn Dirix

TL;DR
This paper presents GEOBERTje, a domain-specific large language model trained on Dutch geological borehole descriptions, which effectively extracts structured information and improves classification accuracy over existing methods.
Contribution
Introduction of GEOBERTje, a domain-adapted large language model for extracting and classifying geological borehole descriptions in Dutch, outperforming rule-based and GPT-4 approaches.
Findings
GEOBERTje accurately extracts subsurface information.
The classifier surpasses rule-based and GPT-4 models.
Enhanced data analysis for geological modeling.
Abstract
Geological borehole descriptions contain detailed textual information about the composition of the subsurface. However, their unstructured format presents significant challenges for extracting relevant features into a structured format. This paper introduces GEOBERTje: a domain adapted large language model trained on geological borehole descriptions from Flanders (Belgium) in the Dutch language. This model effectively extracts relevant information from the borehole descriptions and represents it into a numeric vector space. Showcasing just one potential application of GEOBERTje, we finetune a classifier model on a limited number of manually labeled observations. This classifier categorizes borehole descriptions into a main, second and third lithology class. We show that our classifier outperforms both a rule-based approach and GPT-4 of OpenAI. This study exemplifies how domain adapted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDrilling and Well Engineering · Geological Modeling and Analysis · Natural Language Processing Techniques
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections
