Tomás Pelayo-Benedet (Universidad de Zaragoza), Kevin Borgolte (Ruhr University Bochum), Ricardo J. Rodríguez (Universidad de Zaragoza)

Binary decompilation remains an open challenge in reverse engineering. While recent approaches have begun to leverage the capabilities of large language models (LLMs), most continue to focus exclusively on disassembly as input, ignoring the intermediate representations (IRs) employed by static binary analysis tools and traditional decompilers.

In this paper, we present the first systematic evaluation of LLM-based decompilation using hierarchical IRs. In particular, we investigate how different levels of abstraction in IRs affect binary decompilation quality in five commercial LLMs. Our findings show that the choice of IR significantly influences performance: Smaller models benefit markedly from high-level structured IRs, while larger models show stable performance across IR levels. Our evaluation also reveals a significant trade-off between recompilation success and functional correctness. Code decompiled from disassembly tends to recompile more reliably, but it is less often functionally correct. In contrast, code decompiled from high-level IRs more often retains the original functionality, albeit with slightly lower recompilation success rates. Furthermore, we find that cognitive complexity metrics, such as Halstead measures, are strong predictors of decompilation difficulty, while traditional structural metrics, such as cyclomatic complexity, offer limited insight. We also highlight the main lines of research to improve binary decompilation by combining the advantages of static binary analysis techniques with the capabilities of modern LLMs.

View More Papers

Automating Firmware Vulnerability Triage via High-Level Representations and Similarity...

Daniel Huici, Ricardo J. Rodríguez (University of Zaragoza), Andrei Costin (University of Jyvaskyla), Narges Yousefnezhad (Binare Oy)

Read More

From Underground to Mainstream Marketplaces: Measuring AI-Enabled NSFW Deepfakes...

Mohamed Moustafa Dawoud (University of California, Santa Cruz), Alejandro Cuevas (Princeton University), Ram Sundara Raman (University of California, Santa Cruz)

Read More

BunnyFinder: Finding Incentive Flaws for Ethereum Consensus

Rujia Li (Tsinghua University and State Key Laboratory of Cryptography and Digital Economy Security), Mingfei Zhang (Shandong University), Xueqian Lu (Independent Reseacher), Wenbo Xu (Blockchain Platform Division, Ant Group), Ying Yan (Blockchain Platform Division, Ant Group), Sisi Duan (Tsinghua University, Zhongguancun Laboratory, Shandong Institute of Blockchains and State Key Laboratory of Cryptography and Digital Economy…

Read More