Linxi Jiang (The Ohio State University), Xin Jin (The Ohio State University), Zhiqiang Lin (The Ohio State University)

Function name inference in stripped binaries is an important yet challenging task for many security applications, such as malware analysis and vulnerability discovery, due to the need to grasp binary code semantics amidst diverse instruction sets, architectures, compiler optimizations, and obfuscations. While machine learning has made significant progress in this field, existing methods often struggle with unseen data, constrained by their reliance on a limited vocabulary-based classification approach. In this paper, we present SymGen, a novel framework employing an autoregressive generation paradigm powered by domain-adapted generative large language models (LLMs) for enhanced binary code interpretation. We have evaluated SymGen on a dataset comprising 2,237,915 binary functions across four architectures (x86-64, x86-32, ARM, MIPS) with four levels of optimizations (O0-O3) where it surpasses the state-of-the-art with up to 409.3%, 553.5%, and 489.4% advancement in precision, recall, and F1 score, respectively, showing superior effectiveness and generalizability. Our ablation and case studies also demonstrate the significant performance boosts achieved by our design, e.g., the domain adaptation approach, alongside showcasing SymGen’s practicality in analyzing real-world binaries, e.g., obfuscated binaries and malware executables.

View More Papers

Cellular Metasploit

Dr. Yongdae Kim, Director, KAIST Chair Professor, Electrical Engineering and GSIS, KAIST

Read More

Non-intrusive and Unconstrained Keystroke Inference in VR Platforms via...

Tao Ni (City University of Hong Kong), Yuefeng Du (City University of Hong Kong), Qingchuan Zhao (City University of Hong Kong), Cong Wang (City University of Hong Kong)

Read More

Onion Franking: Abuse Reports for Mix-Based Private Messaging

Matthew Gregoire (University of North Carolina at Chapel Hill), Margaret Pierce (University of North Carolina at Chapel Hill), Saba Eskandarian (University of North Carolina at Chapel Hill)

Read More

Modeling End-User Affective Discomfort With Mobile App Permissions Across...

Yuxi Wu (Georgia Institute of Technology and Northeastern University), Jacob Logas (Georgia Institute of Technology), Devansh Ponda (Georgia Institute of Technology), Julia Haines (Google), Jiaming Li (Google), Jeffrey Nichols (Apple), W. Keith Edwards (Georgia Institute of Technology), Sauvik Das (Carnegie Mellon University)

Read More