Assessing Large Language Models for Stabilizing Numerical Expressions in Scientific Software

Tien Nguyen; Kirshanthan Sundararajah; Muhammad Ali Gulzar

arXiv:2604.04854·cs.SE·April 10, 2026

Assessing Large Language Models for Stabilizing Numerical Expressions in Scientific Software

Tien Nguyen, Kirshanthan Sundararajah, Muhammad Ali Gulzar

PDF

TL;DR

This paper systematically evaluates large language models' ability to stabilize numerical expressions in scientific software, showing they perform well on certain tasks but struggle with control flow and high-precision literals.

Contribution

It provides the first comprehensive assessment of LLMs' effectiveness in numerical stability tasks, highlighting their strengths and limitations compared to traditional methods.

Findings

01

LLMs outperform baseline methods in stabilizing many expressions.

02

LLMs successfully stabilize 97.9% of expressions where baselines fail.

03

LLMs struggle with control flow and high-precision literals.

Abstract

Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in scientific applications, their reliability in handling floating-point numerical stability has not been systematically evaluated. This paper evaluates LLMs' reasoning on high-precision numerical computation through two numerical stabilization tasks: (1) detecting instability in numerical expressions by generating error-inducing inputs (detection), and (2) rewriting expressions to improve numerical stability (stabilization). Using popular numerical benchmarks, we assess six LLMs on nearly 2,470 numerical structures, including nested conditionals, high-precision literals, and multi-variable arithmetic. Our results show that LLMs are equally effective as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.