Qizhi Cai (Zhejiang University), Lingzhi Wang (Northwestern University), Yao Zhu (Zhejiang University), Zhipeng Chen (Zhejiang University), Xiangmin Shen (Hofstra University), Zhenyuan Li (Zhejiang University)

In recent years, provenance-based intrusion detection and forensic systems have attracted significant attention, leading to a rapid growth of related research efforts. However, progress in this area has been hindered by the long-standing lack of updated datasets and benchmarks. Existing datasets suffer from several critical limitations, including outdated attack techniques, short temporal scales, and incomplete or fragmented attack chains. As a result, they fail to capture the characteristics of the latest, real-world Advanced Persistent Threat (APT) attacks. Moreover, the unclear, coarse-grained attack procedures underlying existing datasets make accurate labeling and reliable evaluation difficult. Consequently, the absence of a comprehensive, up-to-date dataset has become a major bottleneck for the progress of this area. To address this, we present our efforts in building a large-scale, diverse, and well-annotated dataset for provenance-based intrusion analysis. Our dataset is generated using an automated attack emulation framework that incorporates recent attack techniques and supports fine-grained ground-truth labeling. Using this dataset, we conduct a comprehensive evaluation of state-of-the-art provenance-based intrusion detection systems, revealing weaknesses that cannot be effectively benchmarked with existing datasets. Our results demonstrate the dataset’s value in enabling clearer, more informative evaluations and highlight its potential to advance future research in provenance-based intrusion detection and graph-based security analysis.

View More Papers

Kangaroo: A Private and Amortized Inference Framework over WAN...

Wei Xu (Xidian University), Hui Zhu (Xidian University), Yandong Zheng (Xidian University), Song Bian (Beihang University), Ning Sun (Xidian University), Hao Yuan (Xidian University), Dengguo Feng (School of Cyber Science and Technology), Hui Li (Xidian University)

Read More

E-FuzzEdge: Efficient In-Place Firmware Fuzzing via Parallel Scheduling (Short...

Davide Rusconi (University of Milan), Osama Yousef (University of Milan), Mirco Picca (University of Milan), Danilo Bruschi (University of Milan), Flavio Toffalini (Ruhr-Universitat Bochum),  Andrea Lanzi (University of Milan)

Read More

HyperMirage: Direct State Manipulation in Hybrid Virtual CPU Fuzzing

Manuel Andreas (Technical University of Munich), Fabian Specht (Technical University of Munich), Marius Momeu (Technical University of Munich)

Read More