Xue Tan (Institute of Big Data, Fudan University, Shanghai, China and College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China), Hao Luan (Institute of Big Data, Fudan University, Shanghai, China and College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China), Mingyu Luo (Institute of Big Data, Fudan University, Shanghai, China and College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China), Zhuyang Yu (Institute of Big Data, Fudan University, Shanghai, China and College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China), Jun Dai (Department of Computer Science, Worcester Polytechnic Institute, MA, USA), Xiaoyan Sun (Department of Computer Science, Worcester Polytechnic Institute, MA, USA), Ping Chen (Institute of Big Data, Fudan University, Shanghai, China and Purple Mountain Laboratories, Nanjing, China)

With the rapid development of Large Language Models (LLMs), their applications have expanded across various aspects of daily life. Open-source LLMs, in particular, have gained popularity due to their accessibility, resulting in widespread downloading and redistribution. The impressive capabilities of LLMs results from training on massive and often undisclosed datasets. This raises the question of whether sensitive content such as copyrighted or personal data is included, which is known as the membership inference problem. Existing methods mainly rely on model outputs and overlook rich internal representations. Limited access to internal data leads to suboptimal results, revealing a research gap for membership inference in open-source white-box LLMs.

In this paper, we address the challenge of detecting the training data of open-source LLMs. To support this investigation, we introduce three dynamic benchmarks: WikiTection, NewsTection, and ArXivTection. We then propose a white-box approach for training data detection by analyzing neural activations of LLMs. Our key insight is that the neuron activations across all layers of LLM reflect the internal representation of knowledge related to the input data within the LLM, which can effectively distinguish between training data and non-training data of LLM. Extensive experiments on these benchmarks demonstrate the strong effectiveness of our approach. For instance, on the WikiTection benchmark, our method achieves an AUC of around 0.98 across five LLMs: GPT2-xl, LLaMA2-7B, LLaMA3-8B, Mistral-7B, and LLaMA2-13B. Additionally, we conducted in-depth analysis on factors such as model size, input length, and text paraphrasing, further validating the robustness and adaptability of our method.

View More Papers

BSFuzzer: Context-Aware Semantic Fuzzing for BLE Logic Flaw Detection

Ting Yang (Xidian University and Kanazawa University), Yue Qin (Central University of Finance and Economics), Lan Zhang (Northern Arizona University), Zhiyuan Fu (Hainan University), Junfan Chen (Hainan University), Jice Wang (Hainan University), Shangru Zhao (University of Chinese Academy of Sciences), Qi Li (Tsinghua University), Ruidong Li (Kanazawa University), He Wang (Xidian University), Yuqing Zhang (University…

Read More

Janus: Enabling Expressive and Efficient ACLs in High-speed RDMA...

Ziteng Chen (Southeast University), Menghao Zhang (Beihang University), Jiahao Cao (Tsinghua University & Quan Cheng Laboratory), Xuzheng Chen (Zhejiang University), Qiyang Peng (Beihang University), Shicheng Wang (Unaffiliated), Guanyu Li (Unaffiliated), Mingwei Xu (Quan Cheng Laboratory & Tsinghua University & Southeast University)

Read More

PriMod4AI: Lifecycle-Aware Privacy Threat Modeling for AI Systems using...

Gautam Savaliya (Deggendorf Institute of Technology, Germany), Robert Aufschlager (Deggendorf Institute of Technology, Germany), Abhishek Subedi (Deggendorf Institute of Technology, Germany), Michael Heigl (Deggendorf Institute of Technology, Germany), Martin Schramm (Deggendorf Institute of Technology, Germany)

Read More