BEMEval-Doc2Schema: Benchmarking Large Language Models for Structured Data Extraction in Building Energy Modeling

Yiyuan Jia; Xiaoqin Fu; Liang Zhang

arXiv:2602.16926·cs.CE·February 20, 2026

BEMEval-Doc2Schema: Benchmarking Large Language Models for Structured Data Extraction in Building Energy Modeling

Yiyuan Jia, Xiaoqin Fu, Liang Zhang

PDF

Open Access

TL;DR

This paper introduces BEMEval-Doc2Schema, a benchmark framework for evaluating large language models' ability to extract structured data from building documentation, advancing automated building energy modeling research.

Contribution

It presents the first standardized benchmark with a new metric (KVOR) for assessing LLM performance in building energy modeling tasks, enabling systematic comparison of models.

Findings

01

Gemini 2.5 outperforms GPT-5 in structured data extraction.

02

Few-shot prompting improves model accuracy.

03

Simpler schemas yield higher KVOR scores.

Abstract

Recent advances in foundation models, including large language models (LLMs), have created new opportunities to automate building energy modeling (BEM). However, systematic evaluation has remained challenging due to the absence of publicly available, task-specific datasets and standardized performance metrics. We present BEMEval, a benchmark framework designed to assess foundation models' performance across BEM tasks. The first benchmark in this suite, BEMEval-Doc2Schema, focuses on structured data extraction from building documentation, a foundational step toward automated BEM processes. BEMEval-Doc2Schema introduces the Key-Value Overlap Rate (KVOR), a metric that quantifies the alignment between LLM-generated structured outputs and ground-truth schema references. Using this framework, we evaluate two leading models (GPT-5 and Gemini 2.5) under zero-shot and few-shot prompting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBIM and Construction Integration · Building Energy and Comfort Optimization · Energy Load and Power Forecasting