Yongpan Wang (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Hong Li (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Xiaojie Zhu (King Abdullah University of Science and Technology, Thuwal, Saudi Arabia), Siyuan Li (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Chaopeng Dong (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Shouguo Yang (Zhongguancun Laboratory, Beijing, China), Kangyuan Qin (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China)

Binary code search plays a crucial role in applications like software reuse detection, and vulnerability identification. Currently, existing models are typically based on either internal code semantics or a combination of function call graphs (CG) and internal code semantics. However, these models have limitations. Internal code semantic models only consider the semantics within the function, ignoring the inter-function semantics, making it difficult to handle situations such as function inlining. The combination of CG and internal code semantics is insufficient for addressing complex real-world scenarios. To address these limitations, we propose BINENHANCE, a novel framework designed to leverage the inter-function semantics to enhance the expression of internal code semantics for binary code search. Specifically, BINENHANCE constructs an External Environment Semantic Graph (EESG), which establishes a stable and analogous external environment for homologous functions by using different inter-function semantic relation (textit{e.g.}, textit{call}, textit{location}, textit{data-co-use}). After the construction of EESG, we utilize the embeddings generated by existing internal code semantic models to initialize EESG nodes. Finally, we design a Semantic Enhancement Model (SEM) that uses Relational Graph Convolutional Networks (RGCNs) and a residual block to learn valuable external semantics on the EESG for generating the enhanced semantics embedding. In addition, BinEnhance utilizes data feature similarity to refine the cosine similarity of semantic embeddings. We conduct experiments under six different tasks (textit{e.g.}, under textit{function inlining} scenario) and the results illustrate the performance and robustness of BINENHANCE. The application of BinEnhance to HermesSim, Asm2vec, TREX, Gemini, and Asteria on two public datasets results in an improvement of Mean Average Precision (MAP) from 53.6% to 69.7%. Moreover, the efficiency increases fourfold.

View More Papers

RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Adversarial...

Dzung Pham (University of Massachusetts Amherst), Shreyas Kulkarni (University of Massachusetts Amherst), Amir Houmansadr (University of Massachusetts Amherst)

Read More

CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian...

Kaiyuan Zhang (Purdue University), Siyuan Cheng (Purdue University), Guangyu Shen (Purdue University), Bruno Ribeiro (Purdue University), Shengwei An (Purdue University), Pin-Yu Chen (IBM Research AI), Xiangyu Zhang (Purdue University), Ninghui Li (Purdue University)

Read More

ReDAN: An Empirical Study on Remote DoS Attacks against...

Xuewei Feng (Tsinghua University), Yuxiang Yang (Tsinghua University), Qi Li (Tsinghua University), Xingxiang Zhan (Zhongguancun Lab), Kun Sun (George Mason University), Ziqiang Wang (Southeast University), Ao Wang (Southeast University), Ganqiu Du (China Software Testing Center), Ke Xu (Tsinghua University)

Read More