Aravind Machiry (UC Santa Barbara), Nilo Redini (UC Santa Barbara), Eric Gustafson (UC Santa Barbara), Hojjat Aghakhani (UC Santa Barbara), Christopher Kruegel (UC Santa Barbara), Giovanni Vigna (UC Santa Barbara)

Binary static analysis has seen a recent surge in interest, due to a rise in analysis targets for which no other method is appropriate, such as, embedded firmware. This has led to the proposal of a number of binary static analysis tools and techniques, handling various kinds of programs, and answering different research questions. While static analysis tools that focus on binaries inherit the undecidability of static analysis, they bring with them other challenges, particularly in dealing with the aliasing of code and data pointers. These tools may tackle these challenges in different ways, but unfortunately, there is currently no concrete means of comparing their effectiveness at solving these central, problem-independent aspects of static analysis.

In this paper, we propose a new method for creating a dataset of real-world programs, paired with the ground truth for static analysis. Our approach involves the injection of synthetic “facts” into a set of open-source programs, consisting of new variables and their possible values. The analyses’ goal is then to evaluate the possible values of these facts at certain program points. As the facts are injected randomly within an arbitrarily-large set of programs, the kinds of data flows that can be measured are widely-varied in size and complexity. We implemented this idea as a prototype system, AUTOFACTS, and used it to create a ground truth dataset of 29 programs, with various types and number of facts, resulting in a total of 2,088 binaries (with 72 versions for each program). To our knowledge, this is the first dataset aimed at the problem-independent evaluation of static analysis tools, and we contribute all code and the dataset itself to the community as open-source.

View More Papers

PISE: Protocol Inference using Symbolic Execution and Automata Learning

Ron Marcovich, Orna Grumberg, Gabi Nakibly (Technion, Israel Institute of Technology)

Read More

GTrans: Graph Transformer-Based Obfuscation-resilient Binary Code Similarity Detection

Yun Zhang (Hunan University), Yuling Liu (Hunan University), Ge Cheng (Xiangtan University), Bo Ou (Hunan University)

Read More

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired...

Kimberly Redmond (University of South Carolina), Lannan Luo (University of South Carolina), Qiang Zeng (University of South Carolina)

Read More

FCGAT: Interpretable Malware Classification Method using Function Call Graph...

Minami Someya (Institute of Information Security), Yuhei Otsubo (National Police Academy), Akira Otsuka (Institute of Information Security)

Read More