ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code

Mingqiao Mo; Yunlong Tan; Hao Zhang; Heng Zhang; Yangfan He

arXiv:2601.20679·cs.CL·January 29, 2026

ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code

Mingqiao Mo, Yunlong Tan, Hao Zhang, Heng Zhang, Yangfan He

PDF

Open Access 3 Reviews

TL;DR

ShieldedCode introduces a novel learning-based framework that enhances the robustness of models against virtual machine protected code, improving code generation and binary similarity detection.

Contribution

It is the first protection-aware framework that learns robust representations of VMP-protected code using hierarchical modeling and contrastive objectives.

Findings

01

Achieves 26.95% Pass@1 on L0 VM code generation.

02

Improves binary similarity detection Recall@1 by 10%.

03

Significantly enhances robustness across protection levels.

Abstract

Large language models (LLMs) have achieved remarkable progress in code generation, yet their potential for software protection remains largely untapped. Reverse engineering continues to threaten software security, while traditional virtual machine protection (VMP) relies on rigid, rule-based transformations that are costly to design and vulnerable to automated analysis. In this work, we present the first protection-aware framework that learns robust representations of VMP-protected code. Our approach builds large-scale paired datasets of source code and normalized VM implementations, and introduces hierarchical dependency modeling at intra-, preceding-, and inter-instruction levels. We jointly optimize language modeling with functionality-aware and protection-aware contrastive objectives to capture both semantic equivalence and protection strength. To further assess resilience, we…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The paper proposes a solid training technique with novel loss formulations and mask designs. The evaluation demonstrates good performance on retrieving VM code under different obfuscation levels.

Weaknesses

Q1: Motivation: There appears to be a logical inconsistency in the motivating narrative. The paper argues that reverse engineering poses threats to software security, yet learning robust representations of VM code constitutes a form of reverse engineering. Please clarify. Q2: PCL loss. FCL requires all obfuscated embeddings to be similar to source code embeddings, while PCL mandates similarity gaps between different obfuscation levels. Are these objectives compatible, or do they introduce confl

Reviewer 02Rating 4Confidence 3

Strengths

The method is well motivated, with a highlight of urgent need for strengthening software resilience against reverse engineering. And the aimed challenges have practical values. This manuscript constructs a large, paired dataset of source code and normalised VM implementations, which can inspire further research on source-2-VMP codes transformations and similarity comparison via LLMs.

Weaknesses

The manuscript mentions the protect level for VMP codes several times in the paper. However, this is no further justification of what does the protection level mean? From the current writing, it seems to be similar to code obfuscation level. Please consider adding illustrative examples and explicit explanation for this critical concept. In section 3.3, the proposed method utilise two contrastive loss components: FCL and PCL, which were claimed to be one the innovative contributions. The FCL fu

Reviewer 03Rating 4Confidence 4

Strengths

Originality: Treats software protection as a representation-learning problem on VMP code, with a clear inductive bias via hierarchical masking over [VINST-*]. Quality: Data pipeline is concrete (compile → VMP(.) → disasm → normalize), the losses are fully specified (LM + FCL + PCL + PEO), and evaluation covers both generation and retrieval across O0-O3 and L1/L3. Clarity: The normalization step is enumerated in four precise actions, and the masking formula is explicit, so a reader can reimple

Weaknesses

- The paper relies on one commercial VMP tool and two protection levels (L1, L3) in testing; this narrows the “heterogeneous protection” claim and makes it unclear how well the model transfers to other VMs or level taxonomies. - Protection-contrastive learning is claimed to enforce an ordering, but the reported Recall@1 across (O?, L1/L3) is not strictly monotone, suggesting the ordering is noisy in practice. - Comparisons are against models that are not trained on this domain, so it is hard

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques