Benchmarking Reasoning Reliability in Artificial Intelligence Models for Energy-System Analysis
Eliseo Curcio

TL;DR
This paper introduces the Analytical Reliability Benchmark (ARB), a standardized framework to evaluate reasoning reliability in AI models used for energy system analysis, addressing a critical gap in validation practices.
Contribution
The study presents the first quantitative method to assess reasoning integrity in AI models for energy analysis, integrating multiple submetrics and testing frontier models under various scenarios.
Findings
GPT-4/5 and Claude 4.5 Sonnet achieved over 90% in reasoning reliability.
Gemini 2.5 Pro showed moderate stability in reasoning.
Llama 3 70B remained below professional reasoning thresholds.
Abstract
Artificial intelligence and machine learning are increasingly used for forecasting, optimization, and policy design in the energy sector, yet no standardized framework exists to evaluate whether these systems reason correctly. Current validation practices focus on predictive accuracy or computational efficiency, leaving the logical integrity of analytical conclusions untested. This study introduces the Analytical Reliability Benchmark (ARB), a reproducible framework that quantifies reasoning reliability in large language models applied to energy system analysis. The benchmark integrates five submetrics: accuracy, reasoning reliability, uncertainty discipline, policy consistency, and transparency, and evaluates model performance across deterministic, probabilistic, and epistemic scenarios using open technoeconomic datasets (NREL ATB 2024, DOE H2A/H2New, IEA WEO 2024). Four frontier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntegrated Energy Systems Optimization · Energy Load and Power Forecasting · Global Energy Security and Policy
