IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs
Songlin Bai, Xintong Wang, Linlin Yu, Bin Chen, Zhiang Xu, Yuyang Sheng, Changtong Zan, Xiaofeng Zhu, Yizhe Zhang, Jiru Li, Mingze Guo, Ling Zou, Yalong Li, Chengfu Huo, Liang Ding

TL;DR
IndustryBench is a comprehensive benchmark for evaluating industrial procurement QA in multiple languages, highlighting the unreliability of current LLMs in safety-critical industrial contexts.
Contribution
The paper introduces IndustryBench, a large, standards-based benchmark with a detailed evaluation pipeline that reveals significant safety and correctness limitations of existing LLMs in industrial QA.
Findings
The best model scores only 2.083 out of 3, indicating substantial room for improvement.
Standards & Terminology is the most persistent weakness across models.
Safety-violation rates significantly affect model rankings, emphasizing the need for safety-aware evaluation.
Abstract
In industrial procurement, an LLM answer is useful only if it survives a standards check: recommended material must match operating condition, every parameter must respect a regulated threshold, and no procedure may contradict a safety clause. Partial correctness can mask safety-critical contradictions that aggregate LLM benchmarks rarely capture. We introduce IndustryBench, a 2,049-item benchmark for industrial procurement QA in Chinese, grounded in Chinese national standards (GB/T) and structured industrial product records, organized by seven capability dimensions, ten industry categories, and panel-derived difficulty tiers, with item-aligned English, Russian, and Vietnamese renderings. Our construction pipeline rejects 70.3% of LLM-generated candidates at a search-based external-verification stage, calibrating how unreliable industrial QA remains after LLM-only filtering. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
