Forklift: An Extensible Neural Lifter
Jordi Armengol-Estap\'e, Rodrigo C. O. Rocha, Jackson Woodruff,, Pasquale Minervini, Michael F.P. O'Boyle

TL;DR
Forklift is a neural network-based assembly-to-IR translator that learns to support multiple ISAs efficiently, outperforming traditional hand-crafted lifters and enabling easier extension to new architectures.
Contribution
We introduce Forklift, the first neural lifter that uses a Transformer model to translate assembly to IR, with incremental support for new ISAs through fine-tuning.
Findings
Translates 2.5x more x86 programs than traditional lifters.
Translates 4.4x more x86 programs than GPT-4.
Supports multiple ISAs with incremental fine-tuning.
Abstract
The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an architecture-independent intermediate representation (IR) (for example, the LLVM IR) and use a pre-existing compiler to recompile the IR to the target ISA. However, the hand-written rules these lifters employ are sensitive to the particular compiler and optimization level used to generate the code and require significant engineering effort to support each new ISA. We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
MethodsAttention Is All You Need · Sparse Evolutionary Training · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · GPT-4 · Position-Wise Feed-Forward Layer · Linear Layer
