Using Large Language Models for Solving Thermodynamic Problems

Rebecca Loubet; Pascal Zittlau; Luisa Vollmer; Marco Hoffmann; Sophie Fellenz; Fabian Jirasek; Heike Leitte; Hans Hasse

arXiv:2502.05195·cs.CE·December 18, 2025

Using Large Language Models for Solving Thermodynamic Problems

Rebecca Loubet, Pascal Zittlau, Luisa Vollmer, Marco Hoffmann, Sophie Fellenz, Fabian Jirasek, Heike Leitte, Hans Hasse

PDF

Open Access

TL;DR

This paper evaluates the reasoning capabilities of various large language models on thermodynamic problems, revealing their strengths in simple cases and limitations in complex reasoning and contextual understanding.

Contribution

It introduces a benchmark of 22 thermodynamic problems to systematically assess LLMs' problem-solving abilities in a specialized scientific domain.

Findings

01

LLMs perform well on simple thermodynamic problems

02

LLMs often lack consistency and contextual understanding

03

Complex problems reveal significant limitations of LLM reasoning

Abstract

Large Language Models (LLMs) have made significant progress in reasoning, demonstrating their capability to generate human-like responses. This study analyzes the problem-solving capabilities of LLMs in the domain of thermodynamics. A benchmark of 22 thermodynamic problems to evaluate LLMs is presented that contains both simple and advanced problems. Five different LLMs are assessed: GPT-3.5, GPT-4, and GPT-4o from OpenAI, Llama 3.1 from Meta, and le Chat from MistralAI. The answers of these LLMs were evaluated by trained human experts, following a methodology akin to the grading of academic exam responses. The scores and the consistency of the answers are discussed, together with the analytical skills of the LLMs. Both strengths and weaknesses of the LLMs become evident. They generally yield good results for the simple problems, but also limitations become clear: The LLMs do not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling