Evaluating Large Language Models for Real-World Engineering Tasks
Rene Heesch, Sebastian Eilermann, Alexander Windmann, Alexander Diedrich, Philipp Rosenthal, Oliver Niggemann

TL;DR
This paper introduces a new dataset of over 100 real-world engineering questions to evaluate large language models, revealing their strengths in reasoning but limitations in complex, context-sensitive tasks.
Contribution
The paper presents a curated, authentic engineering question dataset and systematically evaluates LLMs on complex engineering competencies, addressing previous evaluation shortcomings.
Findings
LLMs excel in temporal and structural reasoning.
LLMs struggle with abstract reasoning and formal modeling.
Performance varies across different LLMs and tasks.
Abstract
Large Language Models (LLMs) are transformative not only for daily activities but also for engineering tasks. However, current evaluations of LLMs in engineering exhibit two critical shortcomings: (i) the reliance on simplified use cases, often adapted from examination materials where correctness is easily verifiable, and (ii) the use of ad hoc scenarios that insufficiently capture critical engineering competencies. Consequently, the assessment of LLMs on complex, real-world engineering problems remains largely unexplored. This paper addresses this gap by introducing a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios, systematically designed to cover core competencies such as product design, prognosis, and diagnosis. Using this dataset, we evaluate four state-of-the-art LLMs, including both cloud-based and locally hosted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Advanced Data Processing Techniques · Business Process Modeling and Analysis
MethodsHigh-Order Consensuses
