Junjie Liang (The Pennsylvania State University), Wenbo Guo (The Pennsylvania State University), Tongbo Luo (Robinhood), Vasant Honavar (The Pennsylvania State University), Gang Wang (University of Illinois at Urbana-Champaign), Xinyu Xing (The Pennsylvania State University)

Supervised machine learning classifiers have been widely used for attack detection, but their training requires abundant high-quality labels. Unfortunately, high-quality labels are difficult to obtain in practice due to the high cost of data labeling and the constant evolution of attackers. Without such labels, it is challenging to train and deploy targeted countermeasures.

In this paper, we propose FARE, a clustering method to enable fine-grained attack categorization under low-quality labels. We focus on two common issues in data labels: 1) missing labels for certain attack classes or families; and 2) only having coarse-grained labels available for different attack types. The core idea of FARE is to take full advantage of the limited labels while using the underlying data distribution to consolidate the low-quality labels. We design an ensemble model to fuse the results of multiple unsupervised learning algorithms with the given labels to mitigate the negative impact of missing classes and coarse-grained labels. We then train an input transformation network to map the input data into a low-dimensional latent space for fine-grained clustering. Using two security datasets (Android malware and network intrusion traces), we show that FARE significantly outperforms the state-of-the-art (semi-)supervised learning methods in clustering quality/correctness. Further, we perform an initial deployment of FARE by working with a large e-commerce service to detect fraudulent accounts. With real-world A/B tests and manual investigation, we demonstrate the effectiveness of FARE to catch previously-unseen frauds.

View More Papers

Detecting Tor Bridge from Sampled Traffic in Backbone Networks

Hua Wu (School of Cyber Science & Engineering and Key Laboratory of Computer Network and Information Integration Southeast University, Ministry of Education, Jiangsu Nanjing, Purple Mountain Laboratories for Network and Communication Security (Nanjing, Jiangsu)), Shuyi Guo, Guang Cheng, Xiaoyan Hu (School of Cyber Science & Engineering and Key Laboratory of Computer Network and Information Integration…

Read More

Is Your Firmware Real or Re-Hosted? A case study...

Abraham A. Clements, Logan Carpenter, William A. Moeglein (Sandia National Laboratories), Christopher Wright (Purdue University)

Read More

PHOENIX: Device-Centric Cellular Network Protocol Monitoring using Runtime Verification

Mitziu Echeverria (The University of Iowa), Zeeshan Ahmed (The University of Iowa), Bincheng Wang (The University of Iowa), M. Fareed Arif (The University of Iowa), Syed Rafiul Hussain (Pennsylvania State University), Omar Chowdhury (The University of Iowa)

Read More

Time-Based CAN Intrusion Detection Benchmark

Deborah Blevins (University of Kentucky), Pablo Moriano, Robert Bridges, Miki Verma, Michael Iannacone, and Samuel Hollifield (Oak Ridge National Laboratory)

Read More