StackSight: Unveiling WebAssembly through Large Language Models and   Neurosymbolic Chain-of-Thought Decompilation

Weike Fang; Zhejian Zhou; Junzhou He; Weihang Wang

arXiv:2406.04568·cs.SE·June 10, 2024

StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation

Weike Fang, Zhejian Zhou, Junzhou He, Weihang Wang

PDF

Open Access

TL;DR

StackSight is a neurosymbolic method that combines large language models and program analysis to improve the decompilation of WebAssembly into readable C++ code, aiding developers' understanding.

Contribution

It introduces a novel approach that integrates static analysis with LLMs for WebAssembly decompilation, enhancing readability and semantic comprehension.

Findings

01

Significantly improves WebAssembly decompilation accuracy

02

Generated code snippets have higher success rates in understanding tasks

03

User study shows better semantic grasp with StackSight outputs

Abstract

WebAssembly enables near-native execution in web applications and is increasingly adopted for tasks that demand high performance and robust security. However, its assembly-like syntax, implicit stack machine, and low-level data types make it extremely difficult for human developers to understand, spurring the need for effective WebAssembly reverse engineering techniques. In this paper, we propose StackSight, a novel neurosymbolic approach that combines Large Language Models (LLMs) with advanced program analysis to decompile complex WebAssembly code into readable C++ snippets. StackSight visualizes and tracks virtual stack alterations via a static analysis algorithm and then applies chain-of-thought prompting to harness LLM's complex reasoning capabilities. Evaluation results show that StackSight significantly improves WebAssembly decompilation. Our user study also demonstrates that code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research