Jiongchi Yu (Singapore Management University), Xiaofei Xie (Singapore Management University), Qiang Hu (Tianjin University), Yuhan Ma (Tianjin University), Ziming Zhao (Zhejiang University)

Insider threats represent a significant and persistent security risk, yet remain difficult to detect in complex enterprise environments, where malicious activities are often concealed within subtle user behaviors. While machine-learning–based insider threat detection (ITD) techniques have shown promising results, their effectiveness is fundamentally constrained by the lack of high-quality and realistic training data. This challenge stems from the highly sensitive nature of enterprise internal data that is rarely accessible and from the limitations of existing datasets, where public datasets are typically small in scale, and synthetic datasets often lack sufficient generalization, rich semantic context, and realistic behavioral patterns.

To address this challenge, we propose Chimera, a large language model (LLM)-based multi-agent framework that automatically simulates both benign and malicious insider activities and monitors comprehensive system logs across diverse enterprise environments. Chimera models each agent as an individual employee with fine-grained roles and incorporates group meetings, pairwise interactions, and self-organized scheduling to capture realistic organizational dynamics. Based on 15 insider attack types abstracted from real-world incidents, we deploy Chimera in three representative data-sensitive organizational scenarios and construct a new dataset, ChimeraLog, for supporting the development and evaluation of ITD methods.

We evaluate ChimeraLog through comprehensive human studies and quantitative analyses, demonstrating its diversity and realism. Experiments with existing ITD methods show that detection performance on ChimeraLog is substantially lower than existing ITD datasets, indicating a more challenging and realistic benchmark. Despite distribution shifts, ITD models trained on ChimeraLog exhibit strong generalization capability, highlighting the practical value of LLM-based multi-agent simulation for advancing ITD.

View More Papers

WBSLT: A Framework for White-Box Encryption Based on Substitution-Linear...

Yang Shi (Tongji University), Tianchen Gao (Tongji University), Yimin Li (Tongji University), Jiayao Gao (Tongji University), Kaifeng Huang (Tongji University)

Read More

PathProb: Probabilistic Inference and Path Scoring for Enhanced and...

Yingqian Hao (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Hui Zou (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Lu Zhou (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Yuxuan Chen (Computer Network Information Center, Chinese…

Read More

BKPIR: Keyword PIR for Private Boolean Retrieval

Jie Song (Institute of Information Engineering, Chinese Academy of Sciences; Intelligent Policing Key Laboratory of Sichuan Province, Sichuan Police College; School of Cyber Security, University of Chinese Academy of Sciences), Zhen Xu (Institute of Information Engineering, Chinese Academy of Sciences), Yan Zhang (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University…

Read More