Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

Haolin Wang; Xianyuan Liu; Anna Jungbluth; Alexandra J. Ramadan; Robert D. J. Oliver; Haiping Lu

arXiv:2604.25568·cond-mat.mtrl-sci·April 29, 2026

Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

Haolin Wang, Xianyuan Liu, Anna Jungbluth, Alexandra J. Ramadan, Robert D. J. Oliver, Haiping Lu

PDF

TL;DR

This paper introduces RealMat-BaG, a benchmark dataset and evaluation framework for assessing the reliability of machine learning models in predicting semiconductor bandgaps under experimental conditions.

Contribution

It provides an open-access experimental bandgap dataset, compares various models, and evaluates their generalization and interpretability in realistic settings.

Findings

01

Current models have fundamental generalization limitations.

02

Benchmark reveals poor transfer from DFT to experimental bandgaps.

03

Analysis highlights interpretability at elemental and structural levels.

Abstract

Accurate bandgap prediction is crucial for semiconductor applications, yet machine learning models trained on computational data often struggle to generalize to experimental bandgap measurements. Challenges related to data fidelity, domain generalization, and model interpretability remain insufficiently addressed in existing evaluation frameworks. To bridge this gap, we introduce RealMat-BaG, a benchmark for assessing model reliability under experimentally relevant conditions. We curate an open-access dataset of experimental bandgaps with aligned crystal structures and compare graph neural networks as well as classical machine learning baselines. Our framework evaluates performance across statistical and domain-based splits, examines transfer from DFT-computed to experimental bandgaps, and analyzes interpretability at both elemental-property and structural levels. Our results reveal the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.