Learning to Find Usages of Library Functions in Optimized Binaries

Toufique Ahmed; Premkumar Devanbu; Anand Ashok Sawant

arXiv:2103.05221·cs.SE·September 20, 2021

Learning to Find Usages of Library Functions in Optimized Binaries

Toufique Ahmed, Premkumar Devanbu, Anand Ashok Sawant

PDF

TL;DR

This paper presents a supervised learning method to improve the recovery of function calls in optimized binaries, enhancing decompilation accuracy especially under high optimization levels.

Contribution

It introduces a novel dataset creation and augmentation approach for training models to identify function calls in binaries, integrated with Ghidra for better decompilation results.

Findings

01

Significant improvement in function call recovery accuracy.

02

Enhanced decompilation quality at high optimization levels.

03

Effective use of data augmentation and pre-training techniques.

Abstract

Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler optimization. These optimizations can transform comprehensible, "natural" source constructions into something entirely unrecognizable. Reverse engineering binaries, especially those suspected of being malevolent or guilty of intellectual property theft, are important and time-consuming tasks. There is a great deal of interest in tools to "decompile" binaries back into more natural source code to aid reverse engineering. Decompilation involves several desirable steps, including recreating source-language constructions, variable names, and perhaps even comments. One central step in creating binaries is optimizing function calls, using steps such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.