Linxi Jiang (The Ohio State University), Xin Jin (The Ohio State University), Zhiqiang Lin (The Ohio State University)

Function name inference in stripped binaries is an important yet challenging task for many security applications, such as malware analysis and vulnerability discovery, due to the need to grasp binary code semantics amidst diverse instruction sets, architectures, compiler optimizations, and obfuscations. While machine learning has made significant progress in this field, existing methods often struggle with unseen data, constrained by their reliance on a limited vocabulary-based classification approach. In this paper, we present SymGen, a novel framework employing an autoregressive generation paradigm powered by domain-adapted generative large language models (LLMs) for enhanced binary code interpretation. We have evaluated SymGen on a dataset comprising 2,237,915 binary functions across four architectures (x86-64, x86-32, ARM, MIPS) with four levels of optimizations (O0-O3) where it surpasses the state-of-the-art with up to 409.3%, 553.5%, and 489.4% advancement in precision, recall, and F1 score, respectively, showing superior effectiveness and generalizability. Our ablation and case studies also demonstrate the significant performance boosts achieved by our design, e.g., the domain adaptation approach, alongside showcasing SymGen’s practicality in analyzing real-world binaries, e.g., obfuscated binaries and malware executables.

View More Papers

DiStefano: Decentralized Infrastructure for Sharing Trusted Encrypted Facts and...

Sofia Celi (Brave Software), Alex Davidson (NOVA LINCS & Universidade NOVA de Lisboa), Hamed Haddadi (Imperial College London & Brave Software), Gonçalo Pestana (Hashmatter), Joe Rowell (Information Security Group, Royal Holloway, University of London)

Read More

Blindfold: Confidential Memory Management by Untrusted Operating System

Caihua Li (Yale University), Seung-seob Lee (Yale University), Lin Zhong (Yale University)

Read More

Evaluating Machine Learning-Based IoT Device Identification Models for Security...

Eman Maali (Imperial College London), Omar Alrawi (Georgia Institute of Technology), Julie McCann (Imperial College London)

Read More

Tweezers: A Framework for Security Event Detection via Event...

Jian Cui (Indiana University), Hanna Kim (KAIST), Eugene Jang (S2W Inc.), Dayeon Yim (S2W Inc.), Kicheol Kim (S2W Inc.), Yongjae Lee (S2W Inc.), Jin-Woo Chung (S2W Inc.), Seungwon Shin (KAIST), Xiaojing Liao (Indiana University)

Read More