Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

Ryo Kanazawa; Koyo Hidaka; Teppei Miyamoto; Takayuki Kato; Tomoki Ando; Chenguang Wang; Dayuan Jiang; Naofumi Fujita; Shuhei Saitoh; Atomu Kondo; Koki Arakawa; and Daiho Nishioka

arXiv:2605.22079·cs.CL·May 22, 2026

Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, and Daiho Nishioka

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces Ishigaki-IDS-Bench, a benchmark for evaluating how well large language models can generate industry-standard XML-based Information Delivery Specifications from BIM requirements, highlighting current limitations and supporting future development.

Contribution

It provides a new benchmark dataset and evaluation framework for assessing LLMs' ability to generate compliant IDS XML from BIM data, including a comprehensive set of expert-verified examples and evaluation tools.

Findings

01

Best LLM achieves 65.6% macro F1 in content agreement.

02

Only 27.7% of outputs pass the Content audit.

03

Current LLMs struggle to reliably generate standard-compliant IDS XML.

Abstract

Large language models (LLMs) are widely used to generate structured outputs such as JSON, SQL, and code, yet public resources remain limited for evaluating generation that must simultaneously satisfy industry-standard XML and domain vocabulary constraints. This paper presents Ishigaki-IDS-Bench, a benchmark for evaluating the ability to generate Information Delivery Specification (IDS) XML from Building Information Modeling (BIM) information requirements. The benchmark contains 166 BIM/IDS expert-authored and verified examples created by expanding 83 practical scenarios into Japanese and English, corresponding gold IDS files, and metadata for input format, language, turn setting, IFC version, and construction domain. Its evaluation combines IDSAuditTool-based Processability, Structure, and Content audits with content-agreement evaluation against gold IDS files. In zero-shot evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://github.com
github

Datasets

ONESTRUCTION/Ishigaki-IDS-Bench
dataset· 69 dl
69 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.