# LLM4FP: LLM-Based Program Generation for Triggering Floating-Point Inconsistencies Across Compilers

**Authors:** Yutong Wang, Cindy Rubio-Gonz\'alez

arXiv: 2509.00256 · 2025-12-30

## TL;DR

LLM4FP leverages Large Language Models to generate diverse floating-point programs that effectively detect inconsistencies across different compilers and optimization levels, improving reliability in numerical software.

## Contribution

This work introduces LLM4FP, the first framework using LLMs for targeted generation of floating-point programs to uncover compiler-induced inconsistencies.

## Key findings

- LLM4FP detects 2.5x more inconsistencies than previous tools.
- Most inconsistencies involve real-valued differences, not NaN or infinities.
- LLM4FP uncovers inconsistencies across more optimization levels.

## Abstract

Floating-point inconsistencies across compilers can undermine the reliability of numerical software. We present LLM4FP, the first framework that uses Large Language Models (LLMs) to generate floating-point programs specifically designed to trigger such inconsistencies. LLM4FP combines Grammar-Based Generation and Feedback-Based Mutation to produce diverse and valid programs. We evaluate LLM4FP across multiple compilers and optimization levels, measuring inconsistency rate, time cost, and program diversity. LLM4FP detects nearly 2.5x the number of inconsistencies as the state-of-the-art tool Varity. Notably, most of the inconsistencies involve real-valued differences, rather than extreme values like NaN or infinities. LLM4FP also uncovers inconsistencies across a wider range of optimization levels, and finds the most mismatches between host and device compilers. These results show that LLM-guided program generation improves the detection of numerical inconsistencies. In practice, numerical software and HPC developers can use LLM4FP to compare compilers and select those that provide more accurate and consistent floating-point behavior, while compiler developers can use it to identify and address subtle consistency issues in their implementations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00256/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00256/full.md

---
Source: https://tomesphere.com/paper/2509.00256