Qizhi Cai (Zhejiang University), Lingzhi Wang (Northwestern University), Yao Zhu (Zhejiang University), Zhipeng Chen (Zhejiang University), Xiangmin Shen (Hofstra University), Zhenyuan Li (Zhejiang University)

In recent years, provenance-based intrusion detection and forensic systems have attracted significant attention, leading to a rapid growth of related research efforts. However, progress in this area has been hindered by the long-standing lack of updated datasets and benchmarks. Existing datasets suffer from several critical limitations, including outdated attack techniques, short temporal scales, and incomplete or fragmented attack chains. As a result, they fail to capture the characteristics of the latest, real-world Advanced Persistent Threat (APT) attacks. Moreover, the unclear, coarse-grained attack procedures underlying existing datasets make accurate labeling and reliable evaluation difficult. Consequently, the absence of a comprehensive, up-to-date dataset has become a major bottleneck for the progress of this area. To address this, we present our efforts in building a large-scale, diverse, and well-annotated dataset for provenance-based intrusion analysis. Our dataset is generated using an automated attack emulation framework that incorporates recent attack techniques and supports fine-grained ground-truth labeling. Using this dataset, we conduct a comprehensive evaluation of state-of-the-art provenance-based intrusion detection systems, revealing weaknesses that cannot be effectively benchmarked with existing datasets. Our results demonstrate the dataset’s value in enabling clearer, more informative evaluations and highlight its potential to advance future research in provenance-based intrusion detection and graph-based security analysis.

View More Papers

Automating Function-Level TARA for Automotive Full-Lifecycle Security

Yuqiao Yang (UESTC), Yongzhao Zhang (UESTC), Wenhao Liu (GoGoByte Technology), Jun Li (GoGoByte Technology), Pengtao Shi (GoGoByte Technology), DingYu Zhong (UESTC), Jie Yang (UESTC), Ting Chen (UESTC), Sheng Cao (UESTC), Yuntao Ren (UESTC), Yongyue Wu (UESTC), Xiaosong Zhang (UESTC)

Read More

The Heat is On: Understanding and Mitigating Vulnerabilities of...

Sri Hrushikesh Varma Bhupathiraju (University of Florida), Shaoyuan Xie (University of California, Irvine), Michael Clifford (Toyota InfoTech Labs), Qi Alfred Chen (University of California, Irvine), Takeshi Sugawara (The University of Electro-Communications), Sara Rampazzi (University of Florida)

Read More

SACK: Systematic Generation of Function Substitution Attacks Against Control-Flow...

Zhechang Zhang (The Pennsylvania State University), Hengkai Ye (The Pennsylvania State University), Song Liu (University of Delaware), Hong Hu (The Pennsylvania State University)

Read More