LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid; Abdullah Al Ishtiaq; Kai Tu; Yilu Dong; Tianwei Wu; Ali Ranjbar; Tianchang Yang; Najrin Sultana; Shagufta Mehnaz; Syed Rafiul Hussain

arXiv:2604.12994·cs.CR·April 24, 2026

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

PDF

TL;DR

LogicEval is a comprehensive framework designed to evaluate automated repair methods, including LLM-based approaches, for logical vulnerabilities in real-world software, supported by a new dataset of 122 vulnerabilities.

Contribution

It introduces LogicEval, the first systematic evaluation framework for logical vulnerability repair techniques, and provides the LogicDS dataset for benchmarking.

Findings

01

LLMs show promise but face challenges like prompt sensitivity and context loss.

02

Compilation and testing failures are common in automated repairs.

03

The dataset enables standardized assessment of repair approaches.

Abstract

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their limited semantic understanding of the vulnerable code and its expected behavior. On the other hand, recent successes of large language models (LLMs) in understanding and repairing code are promising. However, no framework currently exists to analyze the capabilities and limitations of such techniques for logical vulnerabilities. We aim to systematically evaluate both traditional and LLM based repair approaches for addressing real world logical vulnerabilities. To facilitate our assessment, we created the first ever dataset, LogicDS, comprising 122 logical vulnerabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.