Jiongchi Yu (Singapore Management University), Xiaofei Xie (Singapore Management University), Qiang Hu (Tianjin University), Yuhan Ma (Tianjin University), Ziming Zhao (Zhejiang University)

Insider threats represent a significant and persistent security risk, yet remain difficult to detect in complex enterprise environments, where malicious activities are often concealed within subtle user behaviors. While machine-learning–based insider threat detection (ITD) techniques have shown promising results, their effectiveness is fundamentally constrained by the lack of high-quality and realistic training data. This challenge stems from the highly sensitive nature of enterprise internal data that is rarely accessible and from the limitations of existing datasets, where public datasets are typically small in scale, and synthetic datasets often lack sufficient generalization, rich semantic context, and realistic behavioral patterns.

To address this challenge, we propose Chimera, a large language model (LLM)-based multi-agent framework that automatically simulates both benign and malicious insider activities and monitors comprehensive system logs across diverse enterprise environments. Chimera models each agent as an individual employee with fine-grained roles and incorporates group meetings, pairwise interactions, and self-organized scheduling to capture realistic organizational dynamics. Based on 15 insider attack types abstracted from real-world incidents, we deploy Chimera in three representative data-sensitive organizational scenarios and construct a new dataset, ChimeraLog, for supporting the development and evaluation of ITD methods.

We evaluate ChimeraLog through comprehensive human studies and quantitative analyses, demonstrating its diversity and realism. Experiments with existing ITD methods show that detection performance on ChimeraLog is substantially lower than existing ITD datasets, indicating a more challenging and realistic benchmark. Despite distribution shifts, ITD models trained on ChimeraLog exhibit strong generalization capability, highlighting the practical value of LLM-based multi-agent simulation for advancing ITD.

View More Papers

NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for...

Junpeng Wan, Louis Zheng-Hua Tan, Dave (Jing) Tian (Purdue University)

Read More

BunnyFinder: Finding Incentive Flaws for Ethereum Consensus

Rujia Li (Tsinghua University and State Key Laboratory of Cryptography and Digital Economy Security), Mingfei Zhang (Shandong University), Xueqian Lu (Independent Reseacher), Wenbo Xu (Blockchain Platform Division, Ant Group), Ying Yan (Blockchain Platform Division, Ant Group), Sisi Duan (Tsinghua University, Zhongguancun Laboratory, Shandong Institute of Blockchains and State Key Laboratory of Cryptography and Digital Economy…

Read More

DNN Latency Sequencing: Extracting DNN Architectures from Intel SGX...

Minkyung Park (University of Texas at Dallas), Zelun Kong (University of Texas at Dallas), Dave (Jing) Tian (Purdue University), Z. Berkay Celik (Purdue University), Chung Hwan Kim (University of Texas at Dallas)

Read More