Lian Gao (University of California Riverside), Yu Qu (University of California Riverside), Sheng Yu (University of California, Riverside & Deepbits Technology Inc.), Yue Duan (Singapore Management University), Heng Yin (University of California, Riverside & Deepbits Technology Inc.)

Pseudocode diffing precisely locates similar parts and captures differences between the decompiled pseudocode of two given binaries. It is particularly useful in many security scenarios such as code plagiarism detection, lineage analysis, patch, vulnerability analysis, etc. However, existing pseudocode diffing and binary diffing tools suffer from low accuracy and poor scalability, since they either rely on manually-designed heuristics (e.g., Diaphora) or heavy computations like matrix factorization (e.g., DeepBinDiff). To address the limitations, in this paper, we propose a semantics-aware, deep neural network-based model called SigmaDiff. SigmaDiff first constructs IR (Intermediate Representation) level interprocedural program dependency graphs (IPDGs). Then it uses a lightweight symbolic analysis to extract initial node features and locate training nodes for the neural network model. SigmaDiff then leverages the state-of-the-art graph matching model called Deep Graph Matching Consensus (DGMC) to match the nodes in IPDGs. SigmaDiff also introduces several important updates to the design of DGMC such as the pre-training and fine-tuning schema. Experimental results show that SigmaDiff significantly outperforms the state-of-the-art heuristic-based and deep learning-based techniques in terms of both accuracy and efficiency. It is able to precisely pinpoint eight vulnerabilities in a widely-used video conferencing application.

View More Papers

ReqsMiner: Automated Discovery of CDN Forwarding Request Inconsistencies and...

Linkai Zheng (Tsinghua University), Xiang Li (Tsinghua University), Chuhan Wang (Tsinghua University), Run Guo (Tsinghua University), Haixin Duan (Tsinghua University; Quancheng Laboratory), Jianjun Chen (Tsinghua University; Zhongguancun Laboratory), Chao Zhang (Tsinghua University; Zhongguancun Laboratory), Kaiwen Shen (Tsinghua University)

Read More

A Preliminary Study on Using Large Language Models in...

Kumar Shashwat, Francis Hahn, Xinming Ou, Dmitry Goldgof, Jay Ligatti, Larrence Hall (University of South Florida), S. Raj Rajagoppalan (Resideo), Armin Ziaie Tabari (CipherArmor)

Read More

Proof of Backhaul: Trustfree Measurement of Broadband Bandwidth

Peiyao Sheng (Kaleidoscope Blockchain Inc.), Nikita Yadav (Indian Institute of Science), Vishal Sevani (Kaleidoscope Blockchain Inc.), Arun Babu (Kaleidoscope Blockchain Inc.), Anand Svr (Kaleidoscope Blockchain Inc.), Himanshu Tyagi (Indian Institute of Science), Pramod Viswanath (Kaleidoscope Blockchain Inc.)

Read More

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Chengkun Wei (Zhejiang University), Wenlong Meng (Zhejiang University), Zhikun Zhang (CISPA Helmholtz Center for Information Security and Stanford University), Min Chen (CISPA Helmholtz Center for Information Security), Minghu Zhao (Zhejiang University), Wenjing Fang (Ant Group), Lei Wang (Ant Group), Zihui Zhang (Zhejiang University), Wenzhi Chen (Zhejiang University)

Read More