FormationEval, an open multiple-choice benchmark for petroleum geoscience
Almaz Ermilov

TL;DR
FormationEval is a comprehensive open benchmark for assessing language models on petroleum geoscience questions, revealing high overall accuracy but persistent domain and model size gaps, with implications for AI in subsurface sciences.
Contribution
This paper introduces FormationEval, a novel open benchmark dataset for petroleum geoscience, including a detailed evaluation of 72 models across multiple domains and model sizes.
Findings
Top models achieve over 97% accuracy, with Gemini 3 Pro reaching 99.8%.
Open-weight models like GLM-4.7 exceed 98.6% accuracy.
Petrophysics remains the most challenging domain for models.
Abstract
This paper presents FormationEval, an open multiple-choice question benchmark for evaluating language models on petroleum geoscience and subsurface disciplines. The dataset contains 505 questions across seven domains including petrophysics, petroleum geology and reservoir engineering, derived from three authoritative sources using a reasoning model with detailed instructions and a concept-based approach that avoids verbatim copying of copyrighted text. Each question includes source metadata to support traceability and audit. The evaluation covers 72 models from major providers including OpenAI, Anthropic, Google, Meta and open-weight alternatives. The top performers achieve over 97% accuracy, with Gemini 3 Pro Preview reaching 99.8%, while tier and domain gaps persist. Among open-weight models, GLM-4.7 leads at 98.6%, with several DeepSeek, Llama, Qwen and Mistral models also exceeding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Machine Learning in Materials Science
