Building Next-Generation Datasets for Provenance-Based Intrusion Detection

Qizhi Cai (Zhejiang University), Lingzhi Wang (Northwestern University), Yao Zhu (Zhejiang University), Zhipeng Chen (Zhejiang University), Xiangmin Shen (Hofstra University), Zhenyuan Li (Zhejiang University)

In recent years, provenance-based intrusion detection and forensic systems have attracted significant attention, leading to a rapid growth of related research efforts. However, progress in this area has been hindered by the long-standing lack of updated datasets and benchmarks. Existing datasets suffer from several critical limitations, including outdated attack techniques, short temporal scales, and incomplete or fragmented attack chains. As a result, they fail to capture the characteristics of the latest, real-world Advanced Persistent Threat (APT) attacks. Moreover, the unclear, coarse-grained attack procedures underlying existing datasets make accurate labeling and reliable evaluation difficult. Consequently, the absence of a comprehensive, up-to-date dataset has become a major bottleneck for the progress of this area. To address this, we present our efforts in building a large-scale, diverse, and well-annotated dataset for provenance-based intrusion analysis. Our dataset is generated using an automated attack emulation framework that incorporates recent attack techniques and supports fine-grained ground-truth labeling. Using this dataset, we conduct a comprehensive evaluation of state-of-the-art provenance-based intrusion detection systems, revealing weaknesses that cannot be effectively benchmarked with existing datasets. Our results demonstrate the dataset’s value in enabling clearer, more informative evaluations and highlight its potential to advance future research in provenance-based intrusion detection and graph-based security analysis.

Paper

View More Papers

Risk Assessment for ML-Based Applications in Satellite Systems

Simon Shigol (Ben Gurion University of the Negev), Roy Peled (Ben Gurion University of the Negev), Avishag Shapira (Ben Gurion University of the Negev), Yuval Elovici (Ben Gurion University of the Negev), Asaf Shabtai (Ben Gurion University of the Negev)

Enhancing Website Fingerprinting Attacks against Traffic Drift

Xinhao Deng (INSC, Tsinghua University and Ant Group), Yixiang Zhang (INSC, Tsinghua University), Qi Li (INSC, Tsinghua University, State Key Laboratory of Internet Architecture, Tsinghua University and Zhongguancun Laboratory), Zhuotao Liu (INSC, Tsinghua University and Zhongguancun Laboratory), Yabo Wang (DCST, Tsinghua University), Ke Xu (DCST, Tsinghua University, State Key Laboratory of Internet Architecture, Tsinghua University…

Towards Bridging the Telemetry Gap for Security Applications in...

Haohuang Wen (The Ohio State University and SE-RAN.ai), Vinod Yegneswaran (SRI and SE-RAN.ai), Phillip Porras (SRI and SE-RAN.ai), Ashish Gehani (SRI and SE-RAN.ai), Prakhar Sharma (SRI and SE-RAN.ai), Zhiqiang Lin (The Ohio State University and SE-RAN.ai)