André Pacteau, Antonino Vitale, Davide Balzarotti, Simone Aonzo (EURECOM)
Cryptographic function detection in binaries is a crucial task in software reverse engineering (SRE), with significant implications for secure communications, regulatory compliance, and malware analysis. While traditional approaches based on cryptographic signatures are common, they are challenging to maintain and often prone to false negatives in the case of custom implementations or false positives when short signatures are used. Alternatively, techniques based on statistical analysis of mnemonics in disassembled code have emerged, positing that cryptographic functions tend to involve a high frequency of arithmetic and logic operations. However, these methods have predominantly been formulated as heuristics, with thresholds that may not always be optimal or universally applicable.
In this paper, we present Mnemocrypt, a machine learningbased tool for detecting cryptographic functions in x86 executables, which we release as an IDA Pro plugin. Using a random forest classifier, Mnemocrypt leverages both structural and content-related metrics of functions at varying levels of granularity to make its predictions. The primary design goal of Mnemocrypt is to minimize false positives, as misleading results could lead analysts down incorrect investigative paths, undermining the efficacy of reverse engineering efforts. Trained on a diverse dataset of cryptographic libraries compiled with different optimization levels, Mnemocrypt achieves robust detection capabilities without relying on predefined signatures or computationally expensive data flow graph analysis, ensuring high efficiency.
Our evaluation, conducted on 231 Portable Executable x86 Windows malware samples from different families, demonstrates that Mnemocrypt, when configured with a high confidence threshold, significantly outperforms existing solutions in terms of false positives. The few false positives detected by Mnemocrypt were only related to compression functions or complex data processing routines, further emphasizing the tool’s precision in distinguishing algorithms that use instructions similar to cryptographic processes. Finally, with a median execution time of six seconds, Mnemocrypt provides the reverse engineering community with a practical and efficient solution for identifying cryptographic functions, paving the way for further studies to improve this type of model.