Xiangzhe Xu (Purdue University), Zhuo Zhang (Purdue University), Zian Su (Purdue University), Ziyang Huang (Purdue University), Shiwei Feng (Purdue University), Yapeng Ye (Purdue University), Nan Jiang (Purdue University), Danning Xie (Purdue University), Siyuan Cheng (Purdue University), Lin Tan (Purdue University), Xiangyu Zhang (Purdue University)

Decompilation aims to recover the source code form of a binary executable. It has many security applications, such as malware analysis, vulnerability detection, and code hardening. A prominent challenge in decompilation is to recover variable names. We propose a novel technique that leverages the strengths of generative models while mitigating model biases. We build a prototype, GenNm, from pre-trained generative models CodeGemma-2B, CodeLlama-7B, and CodeLlama-34B. We finetune GenNm on decompiled functions and teach models to leverage contextual information. GenNm includes names from callers and callees while querying a function, providing rich contextual information within the model's input token limitation. We mitigate model biases by aligning the output distribution of models with symbol preferences of developers. Our results show that GenNm improves the state-of-the-art name recovery precision by 5.6-11.4 percentage points on two commonly used datasets and improves the state-of-the-art by 32% (from 17.3% to 22.8%) in the most challenging setup where ground-truth variable names are not seen in the training dataset.

View More Papers

MOBIDOJO: A Virtual Security Combat Platform for 5G Cellular...

Hyunwoo Lee (Ohio State University), Haohuang Wen (Ohio State University), Phillip Porras (SRI), Vinod Yegneswaran (SRI), Ashish Gehani (SRI), Prakhar Sharma (SRI), Zhiqiang Lin (Ohio State University)

Read More

Trim My View: An LLM-Based Code Query System for...

Sima Arasteh (University of Southern California), Pegah Jandaghi, Nicolaas Weideman (University of Southern California/Information Sciences Institute), Dennis Perepech, Mukund Raghothaman (University of Southern California), Christophe Hauser (Dartmouth College), Luis Garcia (University of Utah Kahlert School of Computing)

Read More

Towards LLM-Assisted Vulnerability Detection and Repair for Open-Source 5G...

Rupam Patir (University at Buffalo), Qiqing Huang (University at Buffalo), Keyan Guo (University at Buffalo), Wanda Guo (Texas A&M University), Guofei Gu (Texas A&M University), Haipeng Cai (University at Buffalo), Hongxin Hu (University at Buffalo)

Read More

Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language...

Aydin Abadi (Newcastle University), Vishnu Asutosh Dasu (Pennsylvania State University), Sumanta Sarkar (University of Warwick)

Read More