LLM Performance for Code Generation on Noisy Tasks

Radzim Sendyka; Christian Cabrera; Andrei Paleyes; Diana Robinson; Neil Lawrence

arXiv:2505.23598·cs.LG·May 30, 2025

LLM Performance for Code Generation on Noisy Tasks

Radzim Sendyka, Christian Cabrera, Andrei Paleyes, Diana Robinson, Neil Lawrence

PDF

Open Access 1 Repo

TL;DR

This paper examines how large language models perform on highly obfuscated code generation tasks, revealing their reliance on memorization and highlighting challenges for benchmarking and safety evaluation.

Contribution

It introduces the concept of eager pattern matching, analyzes performance decay under obfuscation, and discusses implications for benchmarking and safety in LLMs.

Findings

01

LLMs can solve highly obfuscated tasks beyond human comprehension.

02

Performance decays differently on contaminated versus unseen datasets.

03

Obfuscation reveals reliance on memorization over reasoning.

Abstract

This paper investigates the ability of large language models (LLMs) to recognise and solve tasks which have been obfuscated beyond recognition. Focusing on competitive programming and benchmark tasks (LeetCode and MATH), we compare performance across multiple models and obfuscation methods, such as noise and redaction. We demonstrate that all evaluated LLMs can solve tasks obfuscated to a level where the text would be unintelligible to human readers, and does not contain key pieces of instruction or context. We introduce the concept of eager pattern matching to describe this behaviour, which is not observed in tasks published after the models' knowledge cutoff date, indicating strong memorisation or overfitting to training data, rather than legitimate reasoning about the presented problem. We report empirical evidence of distinct performance decay patterns between contaminated and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

radzim/obfuscated
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Adversarial Robustness in Machine Learning