Derrick McKee (Purdue University), Nathan Burow (MIT Lincoln Laboratory), Mathias Payer (EPFL)

Reverse engineering unknown binaries is a difficult, resource intensive process due to information loss and optimizations performed by compilers that introduce significant binary diversity. Existing binary similarity approaches do not scale or are inaccurate. In this paper, we introduce IOVec Function Identification (IOVFI), which assesses similarity based on program state transformations, which compilers largely guarantee even across compilation environments and architectures. IOVFI executes functions with initial predetermined program states, measures the resulting program state changes, and uses the sets of input and output state vectors as unique semantic fingerprints. Since IOVFI relies on state vectors, and not code measurements, it withstands broad changes in compilers and optimizations used to generate a binary.

Evaluating our IOVFI implementation as a semantic function identifier for coreutils-8.32, we achieve a high .773 average F-Score, indicating high precision and recall. When identifying functions generated from differing compilation environments, IOVFI achieves a 100% accuracy improvement over BinDiff 6, outperforms asm2vec in cross-compilation environment accuracy, and, when compared to dynamic frameworks, BLEX and IMF-SIM, IOVFI is 25%–53% more accurate.

View More Papers

Similarity Metric Method for Binary Basic Blocks of Cross-Instruction...

Xiaochuan Zhang (Artificial Intelligence Research Center, National Innovation Institute of Defense Technology), Wenjie Sun (State Key Laboratory of Mathematical Engineering and Advanced Computing), Jianmin Pang (State Key Laboratory of Mathematical Engineering and Advanced Computing), Fudong Liu (State Key Laboratory of Mathematical Engineering and Advanced Computing), Zhen Ma (State Key Laboratory of Mathematical Engineering and Advanced…

Read More

How to Count Bots in Longitudinal Datasets of IP...

Leon Böck (Technische Universität Darmstadt), Dave Levin (University of Maryland), Ramakrishna Padmanabhan (CAIDA), Christian Doerr (Hasso Plattner Institute), Max Mühlhäuser (Technical University of Darmstadt)

Read More

DOITRUST: Dissecting On-chain Compromised Internet Domains via Graph Learning

Shuo Wang (CSIRO's Data61 & Cybersecurity CRC, Australia), Mahathir Almashor (CSIRO's Data61 & Cybersecurity CRC, Australia), Alsharif Abuadbba (CSIRO's Data61 & Cybersecurity CRC, Australia), Ruoxi Sun (CSIRO's Data61), Minhui Xue (CSIRO's Data61), Calvin Wang (CSIRO's Data61), Raj Gaire (CSIRO's Data61 & Cybersecurity CRC, Australia), Surya Nepal (CSIRO's Data61 & Cybersecurity CRC, Australia), Seyit Camtepe (CSIRO's…

Read More