Linxi Jiang (The Ohio State University), Xin Jin (The Ohio State University), Zhiqiang Lin (The Ohio State University)

Function name inference in stripped binaries is an important yet challenging task for many security applications, such as malware analysis and vulnerability discovery, due to the need to grasp binary code semantics amidst diverse instruction sets, architectures, compiler optimizations, and obfuscations. While machine learning has made significant progress in this field, existing methods often struggle with unseen data, constrained by their reliance on a limited vocabulary-based classification approach. In this paper, we present SymGen, a novel framework employing an autoregressive generation paradigm powered by domain-adapted generative large language models (LLMs) for enhanced binary code interpretation. We have evaluated SymGen on a dataset comprising 2,237,915 binary functions across four architectures (x86-64, x86-32, ARM, MIPS) with four levels of optimizations (O0-O3) where it surpasses the state-of-the-art with up to 409.3%, 553.5%, and 489.4% advancement in precision, recall, and F1 score, respectively, showing superior effectiveness and generalizability. Our ablation and case studies also demonstrate the significant performance boosts achieved by our design, e.g., the domain adaptation approach, alongside showcasing SymGen’s practicality in analyzing real-world binaries, e.g., obfuscated binaries and malware executables.

View More Papers

Detecting IMSI-Catchers by Characterizing Identity Exposing Messages in Cellular...

Tyler Tucker (University of Florida), Nathaniel Bennett (University of Florida), Martin Kotuliak (ETH Zurich), Simon Erni (ETH Zurich), Srdjan Capkun (ETH Zuerich), Kevin Butler (University of Florida), Patrick Traynor (University of Florida)

Read More

TWINFUZZ: Differential Testing of Video Hardware Acceleration Stacks

Matteo Leonelli (CISPA Helmholtz Center for Information Security), Addison Crump (CISPA Helmholtz Center for Information Security), Meng Wang (CISPA Helmholtz Center for Information Security), Florian Bauckholt (CISPA Helmholtz Center for Information Security), Keno Hassler (CISPA Helmholtz Center for Information Security), Ali Abbasi (CISPA Helmholtz Center for Information Security), Thorsten Holz (CISPA Helmholtz Center for Information…

Read More

SafeSplit: A Novel Defense Against Client-Side Backdoor Attacks in...

Phillip Rieger (Technical University of Darmstadt), Alessandro Pegoraro (Technical University of Darmstadt), Kavita Kumari (Technical University of Darmstadt), Tigist Abera (Technical University of Darmstadt), Jonathan Knauer (Technical University of Darmstadt), Ahmad-Reza Sadeghi (Technical University of Darmstadt)

Read More

Evaluating the Strength and Availability of Multilingual Passphrase Authentication

Chi-en Amy Tai (University of Waterloo), Urs Hengartner (University of Waterloo), Alexander Wong (University of Waterloo)

Read More