ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
Mingqiao Mo, Yunlong Tan, Hao Zhang, Heng Zhang, Yangfan He

TL;DR
ShieldedCode introduces a novel learning-based framework that enhances the robustness of models against virtual machine protected code, improving code generation and binary similarity detection.
Contribution
It is the first protection-aware framework that learns robust representations of VMP-protected code using hierarchical modeling and contrastive objectives.
Findings
Achieves 26.95% Pass@1 on L0 VM code generation.
Improves binary similarity detection Recall@1 by 10%.
Significantly enhances robustness across protection levels.
Abstract
Large language models (LLMs) have achieved remarkable progress in code generation, yet their potential for software protection remains largely untapped. Reverse engineering continues to threaten software security, while traditional virtual machine protection (VMP) relies on rigid, rule-based transformations that are costly to design and vulnerable to automated analysis. In this work, we present the first protection-aware framework that learns robust representations of VMP-protected code. Our approach builds large-scale paired datasets of source code and normalized VM implementations, and introduces hierarchical dependency modeling at intra-, preceding-, and inter-instruction levels. We jointly optimize language modeling with functionality-aware and protection-aware contrastive objectives to capture both semantic equivalence and protection strength. To further assess resilience, we…
Peer Reviews
Decision·ICLR 2026 Poster
The paper proposes a solid training technique with novel loss formulations and mask designs. The evaluation demonstrates good performance on retrieving VM code under different obfuscation levels.
Q1: Motivation: There appears to be a logical inconsistency in the motivating narrative. The paper argues that reverse engineering poses threats to software security, yet learning robust representations of VM code constitutes a form of reverse engineering. Please clarify. Q2: PCL loss. FCL requires all obfuscated embeddings to be similar to source code embeddings, while PCL mandates similarity gaps between different obfuscation levels. Are these objectives compatible, or do they introduce confl
The method is well motivated, with a highlight of urgent need for strengthening software resilience against reverse engineering. And the aimed challenges have practical values. This manuscript constructs a large, paired dataset of source code and normalised VM implementations, which can inspire further research on source-2-VMP codes transformations and similarity comparison via LLMs.
The manuscript mentions the protect level for VMP codes several times in the paper. However, this is no further justification of what does the protection level mean? From the current writing, it seems to be similar to code obfuscation level. Please consider adding illustrative examples and explicit explanation for this critical concept. In section 3.3, the proposed method utilise two contrastive loss components: FCL and PCL, which were claimed to be one the innovative contributions. The FCL fu
Originality: Treats software protection as a representation-learning problem on VMP code, with a clear inductive bias via hierarchical masking over [VINST-*]. Quality: Data pipeline is concrete (compile → VMP(.) → disasm → normalize), the losses are fully specified (LM + FCL + PCL + PEO), and evaluation covers both generation and retrieval across O0-O3 and L1/L3. Clarity: The normalization step is enumerated in four precise actions, and the masking formula is explicit, so a reader can reimple
- The paper relies on one commercial VMP tool and two protection levels (L1, L3) in testing; this narrows the “heterogeneous protection” claim and makes it unclear how well the model transfers to other VMs or level taxonomies. - Protection-contrastive learning is claimed to enforce an ordering, but the reported Recall@1 across (O?, L1/L3) is not strictly monotone, suggesting the ordering is noisy in practice. - Comparisons are against models that are not trained on this domain, so it is hard
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques
