Using Large Language Models for Solving Thermodynamic Problems
Rebecca Loubet, Pascal Zittlau, Luisa Vollmer, Marco Hoffmann, Sophie Fellenz, Fabian Jirasek, Heike Leitte, Hans Hasse

TL;DR
This paper evaluates the reasoning capabilities of various large language models on thermodynamic problems, revealing their strengths in simple cases and limitations in complex reasoning and contextual understanding.
Contribution
It introduces a benchmark of 22 thermodynamic problems to systematically assess LLMs' problem-solving abilities in a specialized scientific domain.
Findings
LLMs perform well on simple thermodynamic problems
LLMs often lack consistency and contextual understanding
Complex problems reveal significant limitations of LLM reasoning
Abstract
Large Language Models (LLMs) have made significant progress in reasoning, demonstrating their capability to generate human-like responses. This study analyzes the problem-solving capabilities of LLMs in the domain of thermodynamics. A benchmark of 22 thermodynamic problems to evaluate LLMs is presented that contains both simple and advanced problems. Five different LLMs are assessed: GPT-3.5, GPT-4, and GPT-4o from OpenAI, Llama 3.1 from Meta, and le Chat from MistralAI. The answers of these LLMs were evaluated by trained human experts, following a methodology akin to the grading of academic exam responses. The scores and the consistency of the answers are discussed, together with the analytical skills of the LLMs. Both strengths and weaknesses of the LLMs become evident. They generally yield good results for the simple problems, but also limitations become clear: The LLMs do not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
