Chengfeng Ye (The Hong Kong University of Science and Technology), Anshunkang Zhou (The Hong Kong University of Science and Technology), Charles Zhang (The Hong Kong University of Science and Technology)
Binary diffing, which detects differences between two pieces of binary code, is the fundamental technique in various security analysis tasks.
Existing work shows that a sufficient number of fine-grained alignments as anchor points can significantly improve the overall accuracy of binary diffing. However, existing methods still suffer from numerous limitations that hinder accurate and efficient anchor point identification. Syntax-based techniques are known to be vulnerable to aggressive compiler optimizations, while semantic-based methods are limited by high computation cost or low code coverage.
In this paper, we revisit dynamic analysis to seek new insights to address the limitations of existing approaches. Our main insight is that not all dynamic semantics are necessary or equally effective for identifying valid instruction alignment. Therefore, we can prioritize dynamic execution resources to partially reveal the runtime values that can effectively derive instruction alignment. Based on the above insight, we propose Barracuda, a high-confidence instruction alignment technique based on partial instruction semantics extracted from forced execution. We have implemented Barracuda and conducted extensive experiments to evaluate its effectiveness. Extensive experimental results demonstrate that Barracuda can detect 24.0% more instruction alignment as anchor points with a high precision of 92.1%. The anchor points detected by Barracuda can enhance state-of-the-art binary diffing tools, DeepBinDiff and SigmaDiff, with percentage point increases in F1 scores ranging from 12.3% to 42.7% and 2.2% to 4.1%, respectively, across various binary diffing scenarios.