Yongpan Wang (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Hong Li (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Xiaojie Zhu (King Abdullah University of Science and Technology, Thuwal, Saudi Arabia), Siyuan Li (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Chaopeng Dong (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Shouguo Yang (Zhongguancun Laboratory, Beijing, China), Kangyuan Qin (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China)

Binary code search plays a crucial role in applications like software reuse detection, and vulnerability identification. Currently, existing models are typically based on either internal code semantics or a combination of function call graphs (CG) and internal code semantics. However, these models have limitations. Internal code semantic models only consider the semantics within the function, ignoring the inter-function semantics, making it difficult to handle situations such as function inlining. The combination of CG and internal code semantics is insufficient for addressing complex real-world scenarios. To address these limitations, we propose BINENHANCE, a novel framework designed to leverage the inter-function semantics to enhance the expression of internal code semantics for binary code search. Specifically, BINENHANCE constructs an External Environment Semantic Graph (EESG), which establishes a stable and analogous external environment for homologous functions by using different inter-function semantic relation (textit{e.g.}, textit{call}, textit{location}, textit{data-co-use}). After the construction of EESG, we utilize the embeddings generated by existing internal code semantic models to initialize EESG nodes. Finally, we design a Semantic Enhancement Model (SEM) that uses Relational Graph Convolutional Networks (RGCNs) and a residual block to learn valuable external semantics on the EESG for generating the enhanced semantics embedding. In addition, BinEnhance utilizes data feature similarity to refine the cosine similarity of semantic embeddings. We conduct experiments under six different tasks (textit{e.g.}, under textit{function inlining} scenario) and the results illustrate the performance and robustness of BINENHANCE. The application of BinEnhance to HermesSim, Asm2vec, TREX, Gemini, and Asteria on two public datasets results in an improvement of Mean Average Precision (MAP) from 53.6% to 69.7%. Moreover, the efficiency increases fourfold.

View More Papers

Rethinking Trust in Forge-Based Git Security

Aditya Sirish A Yelgundhalli (New York University), Patrick Zielinski (New York University), Reza Curtmola (New Jersey Institute of Technology), Justin Cappos (New York University)

Read More

Work-in-Progress: Uncovering Dark Patterns: A Longitudinal Study of Cookie...

Zihan Qu (Johns Hopkins University), Xinyi Qu (University College London), Xin Shen, Zhen Liang, and Jianjia Yu (Johns Hopkins University)

Read More

PQConnect: Automated Post-Quantum End-to-End Tunnels

Daniel J. Bernstein (University of Illinois at Chicago and Academia Sinica), Tanja Lange (Eindhoven University of Technology amd Academia Sinica), Jonathan Levin (Academia Sinica and Eindhoven University of Technology), Bo-Yin Yang (Academia Sinica)

Read More

Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language...

Aydin Abadi (Newcastle University), Vishnu Asutosh Dasu (Pennsylvania State University), Sumanta Sarkar (University of Warwick)

Read More