Are LLMs Stable Formal Logic Translators in Logical Reasoning Across Linguistically Diversified Texts?

Qingchuan Li; Jiatong Li; Zirui Liu; Mingyue Cheng; Yuting Zeng; Qi Liu; Tongxuan Liu

arXiv:2506.04575·cs.CL·February 2, 2026

Are LLMs Stable Formal Logic Translators in Logical Reasoning Across Linguistically Diversified Texts?

Qingchuan Li, Jiatong Li, Zirui Liu, Mingyue Cheng, Yuting Zeng, Qi Liu, Tongxuan Liu

PDF

1 Repo

TL;DR

This paper introduces SoLT, a benchmark for testing LLMs' logical reasoning across diverse linguistic forms, and MenTaL, a method to improve their consistency by linking expressions to shared symbols, enhancing reasoning stability.

Contribution

The paper presents a new benchmark, SoLT, for evaluating LLMs on linguistically diverse logical reasoning, and proposes MenTaL, a method to improve symbol consistency during translation.

Findings

01

LLMs struggle with inconsistent symbol mapping under linguistic variation.

02

Applying MenTaL improves reasoning accuracy and stability across diverse inputs.

03

Linguistic diversity significantly impacts LLM-based logical reasoning performance.

Abstract

Logical reasoning with large language models (LLMs) has received growing attention. One mainstream approach translates natural language into formal logic and then applies symbolic solvers for deduction. While effective in many tasks, these LLM-based translators often fail to generate consistent symbolic representations when the same concept appears in different linguistic forms. Such inconsistencies break logical coherence and lead to solver errors. However, most existing benchmarks lack this type of linguistic variation, which frequently occurs in real-world text, leaving the problem underexplored. To address this gap, we present SoLT, a benchmark that systematically rewrites reasoning datasets into diverse yet logically equivalent forms across multiple levels. Beyond evaluation, SoLT also provides a general method to enrich any dataset with linguistic diversity while preserving both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wufeiwuwoshihua/lexicaldiver
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.