Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy
Andrea Brunello, Luca Geatti, Michele Mignani, Angelo Montanari, Nicola Saccomanno

TL;DR
This paper introduces a new benchmarking strategy to evaluate LLMs on natural language to First-Order Logic translation, revealing that state-of-the-art dialogue-oriented models possess strong logical understanding, contrary to previous assumptions.
Contribution
It critically analyzes existing evaluation methods, proposes a novel protocol to better assess logical understanding, and demonstrates the strengths of dialogue-oriented LLMs in NL-FOL translation.
Findings
Dialogue-oriented LLMs show strong NL-FOL translation skills
Existing datasets may misrepresent LLM capabilities
Embedding-centric models perform worse in logical translation
Abstract
Due to its expressiveness and unambiguous nature, First-Order Logic (FOL) is a powerful formalism for representing concepts expressed in natural language (NL). This is useful, e.g., for specifying and verifying desired system properties. While translating FOL into human-readable English is relatively straightforward, the inverse problem, converting NL to FOL (NL-FOL translation), has remained a longstanding challenge, for both humans and machines. Although the emergence of Large Language Models (LLMs) promised a breakthrough, recent literature provides contrasting results on their ability to perform NL-FOL translation. In this work, we provide a threefold contribution. First, we critically examine existing datasets and protocols for evaluating NL-FOL translation performance, revealing key limitations that may cause a misrepresentation of LLMs' actual capabilities. Second, to overcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
