StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation
Weike Fang, Zhejian Zhou, Junzhou He, Weihang Wang

TL;DR
StackSight is a neurosymbolic method that combines large language models and program analysis to improve the decompilation of WebAssembly into readable C++ code, aiding developers' understanding.
Contribution
It introduces a novel approach that integrates static analysis with LLMs for WebAssembly decompilation, enhancing readability and semantic comprehension.
Findings
Significantly improves WebAssembly decompilation accuracy
Generated code snippets have higher success rates in understanding tasks
User study shows better semantic grasp with StackSight outputs
Abstract
WebAssembly enables near-native execution in web applications and is increasingly adopted for tasks that demand high performance and robust security. However, its assembly-like syntax, implicit stack machine, and low-level data types make it extremely difficult for human developers to understand, spurring the need for effective WebAssembly reverse engineering techniques. In this paper, we propose StackSight, a novel neurosymbolic approach that combines Large Language Models (LLMs) with advanced program analysis to decompile complex WebAssembly code into readable C++ snippets. StackSight visualizes and tracks virtual stack alterations via a static analysis algorithm and then applies chain-of-thought prompting to harness LLM's complex reasoning capabilities. Evaluation results show that StackSight significantly improves WebAssembly decompilation. Our user study also demonstrates that code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
