BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

Tosin Adewumi; Martin Karlsson; Lama Alkhaled; Marcus Liwicki

arXiv:2604.26986·cs.CL·May 1, 2026

BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

Tosin Adewumi, Martin Karlsson, Lama Alkhaled, Marcus Liwicki

PDF

TL;DR

This paper introduces BatteryPass-12K, the first synthetic dataset for digital battery passport conformance classification, evaluates 22 language models on the task, and analyzes their performance and vulnerabilities.

Contribution

It presents the first public benchmark dataset for DBP conformance, evaluates diverse language models, and provides insights into model performance and robustness in this domain.

Findings

01

Thinking models like GPT-5.4 perform best with high F1 scores.

02

Few-shot examples significantly improve model performance.

03

Prompt-injection attacks reduce model accuracy.

Abstract

We introduce a novel task of digital battery passport (DBP) conformance classification and introduce the first public benchmark for the task: BatteryPass-12K, created synthetically from real pilot samples. This is as the EU's battery regulation on DBPs comes into effect soon and there exists no public dataset. We evaluated 22 language models (LMs) in zero-shot inference, spanning small LMs (SLMs), mixture of experts (MoEs), and dense LLMs. We also conducted analysis, additional evaluations of few-shot inference and prompt-injection attacks to find that (1) Thinking models have the best performance (with GPT-5.4 scoring 0.98 (0.03) and 0.71 (0.22) on average as F1 (and confidence interval at 95%) on the validation and test sets, respectively), (2) few-shot examples improve performance significantly, (3) generally capable frontier models find the task challenging, (4) merely scaling model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.