What the Fork? Finding and Analyzing Malware in GitHub Forks

Alan Cao (New York University) and Brendan Dolan-Gavitt (New York University)

On GitHub, open-source developers use the fork feature to create server-side clones and implement code changes separately before creating pull requests. However, such fork repositories can be abused to store and distribute malware, particularly malware that stealthily mines cryptocurrencies.

In this paper, we present an analysis of this emerging attack vector and a system for catching malware in GitHub fork repositories with minimal human effort called Fork Integrity Analysis, implemented through a detection infrastructure called Fork Sentry. By automatically detecting and reverse engineering interesting artifacts extracted from a given repository’s forks, we can generate alerts for suspicious artifacts, and provide a means for takedown by GitHub Trust & Safety. We demonstrate the efficacy of our techniques by scanning 68,879 forks of 35 popular cryptocurrency repositories, leading to the discovery of 26 forked repositories that were hosting malware, and report them to GitHub with seven successful takedowns so far. Our detection infrastructure allows not only for the triaging and alerting of suspicious forks, but also provides continuous monitoring for later potential malicious forks. The code and collected data from Fork Sentry will be released as an open-source project.