Luca Massarelli (Sapienza University of Rome), Giuseppe A. Di Luna (CINI - National Laboratory of Cybersecurity), Fabio Petroni (Independent Researcher), Leonardo Querzoni (Sapienza University of Rome), Roberto Baldoni (Italian Presidency of Ministry Council)

In this paper we investigate the use of graph embedding networks, with unsupervised features learning, as neural architecture to learn over binary functions.

We propose several ways of automatically extract features from the control flow graph (CFG) and we use the structure2vec graph embedding techniques to translate a CFG to a vectors of real numbers. We train and test our proposed architectures on two different binary analysis tasks: binary similarity, and, compiler provenance. We show that the unsupervised extraction of features improves the accuracy on the above tasks, when compared with embedding vectors obtained from a CFG annotated with manually engineered features (i.e., ACFG proposed in [39]).

We additionally compare the results of graph embedding networks based techniques with a recent architecture that do not make use of the structural information given by the CFG, and we observe similar performances. We formulate a possible explanation of this phenomenon and we conclude identifying important open challenges.

View More Papers

Rethink Custom Transformers for Binary Analysis

Heng Yin, Professor, Department of Computer Science and Engineering, University of California, Riverside

Read More

Similarity Metric Method for Binary Basic Blocks of Cross-Instruction...

Xiaochuan Zhang (Artificial Intelligence Research Center, National Innovation Institute of Defense Technology), Wenjie Sun (State Key Laboratory of Mathematical Engineering and Advanced Computing), Jianmin Pang (State Key Laboratory of Mathematical Engineering and Advanced Computing), Fudong Liu (State Key Laboratory of Mathematical Engineering and Advanced Computing), Zhen Ma (State Key Laboratory of Mathematical Engineering and Advanced…

Read More

Does Representation Matter? Evaluating IRs for LLM-based Binary Decompilation

Tomás Pelayo-Benedet (Universidad de Zaragoza), Kevin Borgolte (Ruhr University Bochum), Ricardo J. Rodríguez (Universidad de Zaragoza)

Read More