Yu Feng, Osbert Bastani, Ruben Martins, Isil Dillig, Saswat Anand

Download: Paper (PDF)

Date: 27 Feb 2017

Document Type: Reports

Additional Documents: Slides Video

Associated Event: NDSS Symposium 2017


This paper proposes a technique for automatically learning semantic malware signatures for Android from very few samples of a malware family. The key idea underlying our technique is to look for a maximally suspicious common subgraph (MSCS) that is shared between all known instances of a malware family. An MSCS describes the shared functionality between multiple Android applications in terms of inter-component call relations and their semantic metadata (e.g., data-flow properties). Our approach identifies such maximally suspicious common subgraphs by reducing the problem to maximum satisfiability. Once a semantic signature is learned, our approach uses a combination of static analysis and a new approximate signature matching algorithm to determine whether an Android application matches the semantic signature characterizing a given malware family.

We have implemented our approach in a tool called ASTROID and show that it has a number of advantages over state-of-theart malware detection techniques. First, we compare the semantic malware signatures automatically synthesized by ASTROID with manually-written signatures used in previous work and show that the signatures learned by ASTROID perform better in terms of accuracy as well as precision. Second, we compare ASTROID against two state-of-the-art malware detection tools and demonstrate its advantages in terms of interpretability and accuracy. Finally, we demonstrate that ASTROID   s approximate signature matching algorithm is resistant to behavioral obfuscation and that it can be used to detect zero-day malware. In particular, we were able to find 22 instances of zero-day malware in Google Play that are not reported as malware by existing tools.