WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis
Mohammad Robati Shirzad, Patrick Lam

TL;DR
This paper introduces path-based code representations for WebAssembly binaries, enabling improved program analysis by capturing structural information, leading to better accuracy in method name prediction and return type recovery.
Contribution
The paper proposes two novel WebAssembly code representations that generate fixed-size embeddings and enhance sequence-to-sequence models for program analysis.
Findings
Achieved 5.36% top-1 accuracy improvement in method name prediction
Improved return type recovery accuracy by 8.02%
Discovered only 3,352 unique AST paths across large dataset
Abstract
WebAssembly, or Wasm, is a low-level binary language that enables execution of near-native-performance code in web browsers. Wasm has proven to be useful in applications including gaming, audio and video processing, and cloud computing, providing a high-performance, low-overhead alternative to JavaScript in web development. The fast and widespread adoption of WebAssembly by all major browsers has created an opportunity for analysis tools that support this new technology. Deep learning program analysis models can greatly benefit from the program structure information included in Abstract Syntax Tree (AST)-aware code representations. To obtain such code representations, we performed an empirical analysis on the AST paths in the WebAssembly Text format of a large dataset of WebAssembly binary files compiled from source packages in the Ubuntu 18.04 repositories. After refining the collected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Advancements in Photolithography Techniques
