Assessing Large Language Models for Stabilizing Numerical Expressions in Scientific Software
Tien Nguyen, Kirshanthan Sundararajah, Muhammad Ali Gulzar

TL;DR
This paper systematically evaluates large language models' ability to stabilize numerical expressions in scientific software, showing they perform well on certain tasks but struggle with control flow and high-precision literals.
Contribution
It provides the first comprehensive assessment of LLMs' effectiveness in numerical stability tasks, highlighting their strengths and limitations compared to traditional methods.
Findings
LLMs outperform baseline methods in stabilizing many expressions.
LLMs successfully stabilize 97.9% of expressions where baselines fail.
LLMs struggle with control flow and high-precision literals.
Abstract
Scientific software relies on high-precision computation, yet finite floating-point representations can introduce precision errors that propagate in safety-critical domains. Despite the growing use of large language models (LLMs) in scientific applications, their reliability in handling floating-point numerical stability has not been systematically evaluated. This paper evaluates LLMs' reasoning on high-precision numerical computation through two numerical stabilization tasks: (1) detecting instability in numerical expressions by generating error-inducing inputs (detection), and (2) rewriting expressions to improve numerical stability (stabilization). Using popular numerical benchmarks, we assess six LLMs on nearly 2,470 numerical structures, including nested conditionals, high-precision literals, and multi-variable arithmetic. Our results show that LLMs are equally effective as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
