Jiongchi Yu (Singapore Management University), Xiaofei Xie (Singapore Management University), Qiang Hu (Tianjin University), Yuhan Ma (Tianjin University), Ziming Zhao (Zhejiang University)
Insider threat, which can lead to unacceptable losses, is a widespread and significant security concern, making its detection essential. Recently, machine learning based insider threat detection (ITD) methods have been proposed with promising results. Despite this success, a major challenge, the lack of sufficient data, limits the further development of these ITD methods. The paradox is that enterprise internal data is highly sensitive and typically inaccessible, while public datasets are either limited in real-world coverage or, in the case of synthetic data, lack rich semantic information and realistic behavioral patterns. As a result, there is a crucial need for the construction of real-world insider threat datasets.
To address this challenge, we propose Chimera, the first large language model (LLM)-based multi-agent framework to automatically simulate both benign and malicious insider activities, as well as collect logs across diverse enterprise environments. Based on analysis of organizational composition and structural characteristics of the organization, Chimera customizes each LLM agent to represent an individual employee by detailed role modeling and couples with modules of group meetings, pairwise interactions, and self-organized scheduling. In this way, Chimera can reflect the complexities of real-world enterprise operations accurately. The current version of Chimera consists of 15 distinct types of manually abstracted insider attacks, such as intellectual property theft and system sabotage. Using Chimera, we simulate the benign and attack activities across three typical data-sensitive organizational scenarios, including technology company, finance corporation, and medical institution, and generate a new dataset named ChimeraLog to facilitate the development of machine learning-based ITD methods.
To evaluate the quality and authenticity of ChimeraLog, we conduct comprehensive human studies and quantitative analyses. The results demonstrate both the diversity and realism of the dataset. Further expert analysis highlights the presence of realistic threat patterns as well as explainable activity traces. In addition, we evaluate the effectiveness of existing insider threat detection methods on ChimeraLog. The average F1-score achieved is 0.83, which is notably lower than the score of 0.99 observed on the baseline dataset CERT, thereby illustrating the greater difficulty posed by ChimeraLog for threat detection tasks.