Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption
Jaime Morales, Sergio Pastrana, Juan Tapiador

TL;DR
This paper presents a benchmark to evaluate large language models' ability to detect Indicators of Compromise in obfuscated and encrypted JavaScript code, revealing strengths against simple transformations and limitations against encryption.
Contribution
It introduces a systematic benchmark dataset and evaluation framework for assessing LLMs in IoC recovery under adversarial code transformations, including encryption.
Findings
LLMs perform well on simple obfuscations like variable renaming.
Encryption-based concealment significantly reduces detection accuracy.
The benchmark highlights encryption as a major challenge for automated threat analysis.
Abstract
Software obfuscation and encryption present persistent challenges for program comprehension and security analysis, particularly when adversaries conceal Indicators of Compromise (IoCs) such as IP addresses within source code. While Large Language Models (LLMs) have recently demonstrated remarkable progress in code reasoning and transformation, their resilience against adversarial concealment techniques remains largely uncharted. This paper introduces a systematic benchmark for secret detection under adversarial code transformations, designed to evaluate the capacity of LLMs to recover IoCs embedded in obfuscated and encrypted JavaScript programs. We construct a dataset of 336 programs, progressively transformed through 12 levels of obfuscation and cryptographic concealment (including XOR and AES-256), to emulate realistic threat scenarios. An automated evaluation framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
