Workshop on Artificial Intelligence System with Confidential Computing (AISCC) 2024 Program
Monday, 26 February
Jian Cui (Indiana University Bloomington)
Twitter has been recognized as a highly valuable source for security practitioners, offering timely updates on breaking events and threat analyses. Current methods for automating event detection on Twitter rely on standard text embedding techniques to cluster tweets. However, these methods are not effective as standard text embeddings are not specifically designed for clustering security-related tweets. To tackle this, our paper introduces a novel method for creating custom embeddings that improve the accuracy and comprehensiveness of security event detection on Twitter. This method integrates patterns of security-related entity sharing between tweets into the embedding process, resulting in higher-quality embeddings that significantly enhance precision and coverage in identifying security events.
Recent work in ICML’22 established a connection between dataset condensation (DC) and differential privacy (DP), which is unfortunately problematic. To correctly connect DC and DP, we propose two differentially private dataset condensation (DPDC) algorithms—LDPDC and NDPDC. LDPDC is a linear DC algorithm that can be executed on a low-end Central Processing Unit (CPU), while NDPDC is a nonlinear DC algorithm that leverages neural networks to extract and match the latent representations between real and synthetic data. Through extensive evaluations, we demonstrate that LDPDC has comparable performance to recent DP generative methods despite its simplicity. NDPDC provides acceptable DP guarantees with a mild utility loss, compared to distribution matching (DM). Additionally, NDPDC allows a flexible trade-off between the synthetic data utility and DP budget.
Heterogeneous Graph Pre-training Based Model for Secure and Efficient Prediction of Default Risk Propagation among Bond IssuersXurui Li (Fudan University), Xin Shan (Bank of Shanghai), Wenhao Yin (Shanghai Saic Finance Co., Ltd)
Efficient prediction of default risk for bond-issuing enterprises is pivotal for maintaining stability and fostering growth in the bond market. Conventional methods usually rely solely on an enterprise’s internal data for risk assessment. In contrast, graph-based techniques leverage interconnected corporate information to enhance default risk identification for targeted bond issuers. Traditional graph techniques such as label propagation algorithm or deepwalk fail to effectively integrate a enterprise’s inherent attribute information with its topological network data. Additionally, due to data scarcity and security privacy concerns between enterprises, end-to-end graph neural network (GNN) algorithms may struggle in delivering satisfactory performance for target tasks. To address these challenges, we present a novel two-stage model. In the first stage, we employ an innovative Masked Autoencoders for Heterogeneous Graph (HGMAE) to pre-train on a vast enterprise knowledge graph. Subsequently, in the second stage, a specialized classifier model is trained to predict default risk propagation probabilities. The classifier leverages concatenated feature vectors derived from the pre-trained encoder with the enterprise’s task-specific feature vectors. Through the two-stage training approach, our model not only boosts the importance of unique bond characteristics for specific default prediction tasks, but also securely and efficiently leverage the global information pre-trained from other enterprises. Experimental results demonstrate that our proposed model outperforms existing approaches in predicting default risk for bond issuers.
Dennis Jacob, Chong Xiang, Prateek Mittal (Princeton University)
The advent of deep learning has brought about vast improvements to computer vision systems and facilitated the development of self-driving vehicles. Nevertheless, these models have been found to be susceptible to adversarial attacks. Of particular importance to the research community are patch attacks, which have been found to be realizable in the physical world. While certifiable defenses against patch attacks have been developed for tasks such as single-label classification, there does not exist a defense for multi-label classification. In this work, we propose such a defense called Multi-Label PatchCleanser, an extension of the current state-of-the-art (SOTA) method for single-label classification. We find that our approach can achieve non-trivial robustness on the MSCOCO 2014 validation dataset while maintaining high clean performance. Additionally, we leverage a key constraint between patch and object locations to develop a novel procedure and improve upon baseline robust performance.
Linkang Du (Zhejiang University), Zheng Zhu (Zhejiang University), Min Chen (CISPA Helmholtz Center for Information Security), Shouling Ji (Zhejiang University), Peng Cheng (Zhejiang University), Jiming Chen (Zhejiang University), Zhikun Zhang (Stanford University)
The text-to-image models based on diffusion processes, capable of transforming text descriptions into detailed images, have widespread applications in art, design, and beyond, such as DALL-E, Stable Diffusion, and Midjourney. However, they enable users without artistic training to create artwork comparable to professional quality, leading to concerns about copyright infringement. To tackle these issues, previous works have proposed strategies such as adversarial perturbation-based and watermarking-based methods. The former involves introducing subtle changes to disrupt the image generation process, while the latter involves embedding detectable marks in the artwork. The existing methods face limitations such as requiring modifications of the original image, being vulnerable to image pre-processing, and facing difficulties in applying them to the published artwork.
To this end, we propose a new paradigm, called StyleAuditor, for artistic style auditing. StyleAuditor identifies if a suspect model has been fine-tuned using a specific artist’s artwork by analyzing style-related features. Specifically, StyleAuditor employs a style extractor to obtain the multi-granularity style representations and treats artwork as samples of an artist’s style. Then, StyleAuditor queries a trained discriminator to gain the auditing decisions. The results of the experiment on the artwork of thirty artists demonstrate the high accuracy of StyleAuditor, with an auditing accuracy of over 90% and a false positive rate of less than 1.3%.
Zhuo Chen, Jiawei Liu, Haotan Liu (Wuhan University)
Neural network models have been widely applied in the field of information retrieval, but their vulnerability has always been a significant concern. In retrieval of public topics, the problems posed by the vulnerability are not only returning inaccurate or irrelevant content, but also returning manipulated opinions. One can distort the original ranking order based on the stance of the retrieved opinions, potentially influencing the searcher’s perception of the topic, weakening the reliability of retrieval results and damaging the fairness of opinion ranking. Based on the aforementioned challenges, we combine stance detection methods with existing text ranking manipulation methods to experimentally demonstrate the feasibility and threat of opinion manipulation. Then we design a user experiment in which each participant independently rated the credibility of the target topic based on the unmanipulated or manipulated retrieval results. The experimental result indicates that opinion manipulation can effectively influence people’s perceptions of the target topic. Furthermore, we preliminarily propose countermeasures to address the issue of opinion manipulation and build more reliable and fairer retrieval ranking systems.
Tianyue Chu, Devriş İşler (IMDEA Networks Institute & Universidad Carlos III de Madrid), Nikolaos Laoutaris (IMDEA Networks Institute)
Federated Learning (FL) has evolved into a pivotal paradigm for collaborative machine learning, enabling a centralised server to compute a global model by aggregating the local models trained by clients. However, the distributed nature of FL renders it susceptible to poisoning attacks that exploit its linear aggregation rule called FEDAVG. To address this vulnerability, FEDQV has been recently introduced as a superior alternative to FEDAVG, specifically designed to mitigate poisoning attacks by taxing more than linearly deviating clients. Nevertheless, FEDQV remains exposed to privacy attacks that aim to infer private information from clients’ local models. To counteract such privacy threats, a well-known approach is to use a Secure Aggregation (SA) protocol to ensure that the server is unable to inspect individual trained models as it aggregates them. In this work, we show how to implement SA on top of FEDQV in order to address both poisoning and privacy attacks. We mount several privacy attacks against FEDQV and demonstrate the effectiveness of SA in countering them.
Angelo Ruocco, Chris Porter, Claudio Carvalho, Daniele Buono, Derren Dunn, Hubertus Franke, James Bottomley, Marcio Silva, Mengmei Ye, Niteesh Dubey, Tobin Feldman-Fitzthum (IBM Research)
Developers leverage machine learning (ML) platforms to handle a range of their ML tasks in the cloud, but these use cases have not been deeply considered in the context of confidential computing. Confidential computing’s threat model treats the cloud provider as untrusted, so the user’s data in use (and certainly at rest) must be encrypted and integrity-protected. This host-guest barrier presents new challenges and opportunities in the ML platform space. In particular, we take a glancing look at ML platforms’ pipeline tools, how they currently align with the Confidential Containers project, and what may be needed to bridge several gaps.
Weiheng Bai (University of Minnesota), Qiushi Wu (IBM Research), Kefu Wu, Kangjie Lu (University of Minnesota)
In recent years, large language models (LLMs) have been widely used in security-related tasks, such as security bug identification and patch analysis. The effectiveness of LLMs in these tasks is often influenced by the construction of appropriate prompts. Some state-of-the-art research has proposed multiple factors to improve the effectiveness of building prompts. However, the influence of prompt content on the accuracy and efficacy of LLMs in executing security tasks remains underexplored. Addressing this gap, our study conducts a comprehensive experiment, assessing various prompt methodologies in the context of security-related tasks. We employ diverse prompt structures and contents and evaluate their impact on the performance of LLMs in security-related tasks. Our findings suggest that appropriately modifying prompt structures and content can significantly enhance the performance of LLMs in specific security tasks. Conversely, improper prompt methods can markedly reduce LLM effectiveness. This research not only contributes to the understanding of prompt influence on LLMs but also serves as a valuable guide for future studies on prompt optimization for security tasks. Our code and dataset is available at Wayne-Bai/Prompt-Affection.
Isra Elsharef, Zhen Zeng (University of Wisconsin-Milwaukee), Zhongshu Gu (IBM Research)
In recent years, security engineers in product teams have faced new challenges in threat modeling due to the increasing complexity of product design and the evolving nature of threats. First, security engineers must possess a thorough understanding of how to translate the abstract categories of threat modeling methodology into specific security threats relevant to a particular aspect of product design. Without such indepth knowledge, applying threat modeling in practice becomes a difficult task. Second, security engineers must be aware of current vulnerabilities and be able to quickly recall those that may be relevant to the product design. Therefore, for effective threat modeling, a deep understanding of a product’s design and the background knowledge to connect real-time security events with specific design principles embedded in large volumes of technical specifications is required. This can result in a lot of human effort invested in parsing, searching, and understanding what is being built through design documents and what threats are relevant based on that information. We observe that the recent emergence of large language models (LLMs) may significantly change the landscape of threat modeling by automating and accelerating the process with their language understanding and logical reasoning capabilities. In this paper, we have developed a novel LLM-based threat modeling system by leveraging NLP techniques and an open-source LLM to decrease the required human effort above in the threat modeling process. In the evaluation, two major questions of threat modeling (MQ1 and MQ2) are considered in the proposed workflow of Task 1 and Task 2, where the NLP techniques assist in parsing and understanding design documents and threats, and the LLM analyzes and synthesizes volumes of documentation to generate responses to related threat modeling questions. Our initial findings reveal that over 75% of the responses meet the expectations of human evaluation. The Retrieval Augmented Generation (RAG)-enhanced LLM outperforms the base LLM in both tasks by responding more concisely and containing more meaningful information. This study explores a novel approach to threat modeling and demonstrates the practical applicability of LLM-assisted threat modeling.
The robustness of deep learning models against adversarial attacks remains a pivotal concern. This study presents, for the first time, an exhaustive review of the transferability aspect of adversarial attacks. It systematically categorizes and critically evaluates various methodologies developed to augment the transferability of adversarial attacks. This study encompasses a spectrum of techniques, including Generative Structure, Semantic Similarity, Gradient Editing, Target Modification, and Ensemble Approach. Concurrently, this paper introduces a benchmark framework TAA-Bench, integrating ten leading methodologies for adversarial attack transferability, thereby providing a standardized and systematic platform for comparative analysis across diverse model architectures. Through comprehensive scrutiny, we delineate the efficacy and constraints of each method, shedding light on their underlying operational principles and practical utility. This review endeavors to be a quintessential resource for both scholars and practitioners in the field, charting the complex terrain of adversarial transferability and setting a foundation for future explorations in this vital sector. The associated codebase is accessible at: https://github.com/KxPlaug/TAA-Bench
Gelei Deng, Yi Liu (Nanyang Technological University), Yuekang Li (The University of New South Wales), Wang Kailong(Huazhong University of Science and Technology), Tianwei Zhang, Yang Liu (Nanyang Technological University)
Large Language Models (LLMs) have gained immense popularity and are being increasingly applied in various domains. Consequently, ensuring the security of these models is of paramount importance. Jailbreak attacks, which manipulate LLMs to generate malicious content, are recognized as a significant vulnerability. While existing research has predominantly focused on direct jailbreak attacks on LLMs, there has been limited exploration of indirect methods. The integration of various plugins into LLMs, notably Retrieval Augmented Generation (RAG), which enables LLMs to incorporate external knowledge bases into their response generation such as GPTs, introduces new avenues for indirect jailbreak attacks.
To fill this gap, we investigate indirect jailbreak attacks on LLMs, particularly GPTs, introducing a novel attack vector named Retrieval Augmented Generation Poisoning. This method, PANDORA, exploits the synergy between LLMs and RAG through prompt manipulation to generate unexpected responses. PANDORA uses maliciously crafted content to influence the RAG process, effectively initiating jailbreak attacks. Our preliminary tests show that PANDORA successfully conducts jailbreak attacks in four different scenarios, achieving higher success rates than direct attacks, with 64.3% for GPT-3.5 and 34.8% for GPT-4.