Exploring the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Fuka Matsuzaki; Haru-Tada Sato

arXiv:2411.05665·cs.CL·September 9, 2025

Exploring the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Fuka Matsuzaki, Haru-Tada Sato

PDF

Open Access 1 Repo

TL;DR

This study systematically evaluates large language models' masked text processing abilities using new tasks, revealing their reliance on semantic cues and limitations in reasoning under masking conditions.

Contribution

Introduces MskQA and MskCal tasks to assess LLMs' reasoning with masked text, highlighting their dependence on semantic cues and performance variability.

Findings

01

GPT-4o outperforms 4o-mini in masked reasoning tasks.

02

Performance drops significantly with solid masking.

03

Semantic cues are crucial for LLM reasoning.

Abstract

This paper sheds light on the limitations of Large Language Models (LLMs) by rigorously evaluating their ability to process masked text. We introduce two novel tasks: MskQA, measuring reasoning on masked question-answering datasets like RealtimeQA, and MskCal, assessing numerical reasoning on masked arithmetic problems.Testing GPT-4o and 4o-mini reveals that while LLMs exhibit some resilience to masked text, their performance is highly contingent on masking rates and semantic cues. Specifically, "solid masking," where semantic clues are entirely absent, leads to a significant performance drop compared to "partial lifting," where some semantic information is retained, indicating LLMs' reliance on surface-level patterns. Interestingly, GPT-4o consistently outperforms 4o-mini, particularly in MskCal, demonstrating a greater ability to handle numerical reasoning with masked text. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isfhub/maskcode
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling