Yongpan Wang (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Hong Li (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Xiaojie Zhu (King Abdullah University of Science and Technology, Thuwal, Saudi Arabia), Siyuan Li (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Chaopeng Dong (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China), Shouguo Yang (Zhongguancun Laboratory, Beijing, China), Kangyuan Qin (Institute of Information Engineering Chinese Academy of Sciences & University of Chinese Academy of Sciences, China)

Binary code search plays a crucial role in applications like software reuse detection, and vulnerability identification. Currently, existing models are typically based on either internal code semantics or a combination of function call graphs (CG) and internal code semantics. However, these models have limitations. Internal code semantic models only consider the semantics within the function, ignoring the inter-function semantics, making it difficult to handle situations such as function inlining. The combination of CG and internal code semantics is insufficient for addressing complex real-world scenarios. To address these limitations, we propose BINENHANCE, a novel framework designed to leverage the inter-function semantics to enhance the expression of internal code semantics for binary code search. Specifically, BINENHANCE constructs an External Environment Semantic Graph (EESG), which establishes a stable and analogous external environment for homologous functions by using different inter-function semantic relation (textit{e.g.}, textit{call}, textit{location}, textit{data-co-use}). After the construction of EESG, we utilize the embeddings generated by existing internal code semantic models to initialize EESG nodes. Finally, we design a Semantic Enhancement Model (SEM) that uses Relational Graph Convolutional Networks (RGCNs) and a residual block to learn valuable external semantics on the EESG for generating the enhanced semantics embedding. In addition, BinEnhance utilizes data feature similarity to refine the cosine similarity of semantic embeddings. We conduct experiments under six different tasks (textit{e.g.}, under textit{function inlining} scenario) and the results illustrate the performance and robustness of BINENHANCE. The application of BinEnhance to HermesSim, Asm2vec, TREX, Gemini, and Asteria on two public datasets results in an improvement of Mean Average Precision (MAP) from 53.6% to 69.7%. Moreover, the efficiency increases fourfold.

View More Papers

WAVEN: WebAssembly Memory Virtualization for Enclaves

Weili Wang (Southern University of Science and Technology), Honghan Ji (ByteDance Inc.), Peixuan He (ByteDance Inc.), Yao Zhang (ByteDance Inc.), Ye Wu (ByteDance Inc.), Yinqian Zhang (Southern University of Science and Technology)

Read More

Securing BGP ASAP: ASPA and other Post-ROV Defenses

Justin Furuness (University of Connecticut), Cameron Morris (University of Connecticut), Reynaldo Morillo (University of Connecticut), Arvind Kasiliya (University of Connecticut), Bing Wang (University of Connecticut), Amir Herzberg (University of Connecticut)

Read More

Query Privacy in Data Spaces

Shuwen Liu (School of Data Science, The Chinese University of Hong Kong, Shenzhen, China), George C. Polyzos (School of Data Science, The Chinese University of Hong Kong, Shenzhen, China and ExcID P.C., Athens, Greece)

Read More