Evaluating Large Language Models for Real-World Engineering Tasks

Rene Heesch; Sebastian Eilermann; Alexander Windmann; Alexander Diedrich; Philipp Rosenthal; Oliver Niggemann

arXiv:2505.13484·cs.AI·May 21, 2025

Evaluating Large Language Models for Real-World Engineering Tasks

Rene Heesch, Sebastian Eilermann, Alexander Windmann, Alexander Diedrich, Philipp Rosenthal, Oliver Niggemann

PDF

Open Access

TL;DR

This paper introduces a new dataset of over 100 real-world engineering questions to evaluate large language models, revealing their strengths in reasoning but limitations in complex, context-sensitive tasks.

Contribution

The paper presents a curated, authentic engineering question dataset and systematically evaluates LLMs on complex engineering competencies, addressing previous evaluation shortcomings.

Findings

01

LLMs excel in temporal and structural reasoning.

02

LLMs struggle with abstract reasoning and formal modeling.

03

Performance varies across different LLMs and tasks.

Abstract

Large Language Models (LLMs) are transformative not only for daily activities but also for engineering tasks. However, current evaluations of LLMs in engineering exhibit two critical shortcomings: (i) the reliance on simplified use cases, often adapted from examination materials where correctness is easily verifiable, and (ii) the use of ad hoc scenarios that insufficiently capture critical engineering competencies. Consequently, the assessment of LLMs on complex, real-world engineering problems remains largely unexplored. This paper addresses this gap by introducing a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios, systematically designed to cover core competencies such as product design, prognosis, and diagnosis. Using this dataset, we evaluate four state-of-the-art LLMs, including both cloud-based and locally hosted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Techniques and Practices · Advanced Data Processing Techniques · Business Process Modeling and Analysis

MethodsHigh-Order Consensuses