Binary static analysis has seen a recent surge in interest, due to a rise in analysis targets for which no other method is appropriate, such as, embedded firmware. This has led to the proposal of a number of binary static analysis tools and techniques, handling various kinds of programs, and answering different research questions. While static analysis tools that focus on binaries inherit the undecidability of static analysis, they bring with them other challenges, particularly in dealing with the aliasing of code and data pointers. These tools may tackle these challenges in different ways, but unfortunately, there is currently no concrete means of comparing their effectiveness at solving these central, problem-independent aspects of static analysis.

In this paper, we propose a new method for creating a dataset of real-world programs, paired with the ground truth for static analysis. Our approach involves the injection of synthetic “facts” into a set of open-source programs, consisting of new variables and their possible values. The analyses’ goal is then to evaluate the possible values of these facts at certain program points. As the facts are injected randomly within an arbitrarily-large set of programs, the kinds of data flows that can be measured are widely-varied in size and complexity. We implemented this idea as a prototype system, AUTOFACTS, and used it to create a ground truth dataset of 29 programs, with various types and number of facts, resulting in a total of 2,088 binaries (with 72 versions for each program). To our knowledge, this is the first dataset aimed at the problem-independent evaluation of static analysis tools, and we contribute all code and the dataset itself to the community as open-source.

View More Papers

Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction...

Luca Massarelli (Sapienza University of Rome), Giuseppe A. Di Luna (CINI - National Laboratory of Cybersecurity), Fabio Petroni (Independent Researcher),...

Read More

Enhancing Symbolic Execution by Machine Learning Based Solver Selection

Sheng-Han Wen (National Taiwan University), Wei-Loon Mow (National Taiwan University), Wei-Ning Chen (National Taiwan University), Chien-Yuan Wang (National Taiwan University),...

Read More

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired...

Kimberly Redmond (University of South Carolina), Lannan Luo (University of South Carolina), Qiang Zeng (University of South Carolina)

Read More