Huaijin Wang (The Ohio State University), Zhiqiang Lin (The Ohio State University)
Binary Code Similarity Analysis (BCSA) plays a vital role in many security tasks, including malware analysis, vulnerability detection, and software supply chain security. While numerous BCSA techniques have been proposed over the past decade, few leverage the semantics of register and memory textit{values} for comparison, despite promising initial results. Existing value-based approaches often focus narrowly on values that remain invariant across compilation settings, thereby overlooking a broader spectrum of semantically rich information. In this paper, we identify three core challenges limiting the effectiveness of value-based BCSA: unscalable value extraction, lack of noise filtering, and inefficient value comparison. These shortcomings hinder both semantic coverage and scalability. To unlock the full potential of value-based BCSA, we propose vSim, a novel framework that systematically captures values from all register and memory operations, filters out semantically irrelevant values (e.g., global addresses), and normalizes and propagates the remaining values to enable robust and scalable similarity analysis. Extensive evaluation shows that vSim consistently outperforms state-of-the-art BCSA systems in accuracy, robustness, and scalability. It generalizes well across architectures and toolchains, producing reliable results on diverse datasets.