Chenxu Wang (Southern University of Science and Technology (SUSTech) and The Hong Kong Polytechnic University), Fengwei Zhang (Southern University of Science and Technology (SUSTech)), Yunjie Deng (Southern University of Science and Technology (SUSTech)), Kevin Leach (Vanderbilt University), Jiannong Cao (The Hong Kong Polytechnic University), Zhenyu Ning (Hunan University), Shoumeng Yan (Ant Group), Zhengyu He (Ant…

Confidential computing is an emerging technique that provides users and third-party developers with an isolated and transparent execution environment. To support this technique, Arm introduced the Confidential Computing Architecture (CCA), which creates multiple isolated address spaces, known as realms, to ensure data confidentiality and integrity in security-sensitive tasks. Arm recently proposed the concept of confidential computing on GPU hardware, which is widely used in general-purpose, high-performance, and artificial intelligence computing scenarios. However, hardware and firmware supporting confidential GPU workloads remain unavailable. Existing studies leverage Trusted Execution Environments (TEEs) to secure GPU computing on Arm- or Intel-based platforms, but they are not suitable for CCA's realm-style architecture, such as using incompatible hardware or introducing a large trusted computing base (TCB). Therefore, there is a need to complement existing Arm CCA capabilities with GPU acceleration.

To address this challenge, we present CAGE to support confidential GPU computing for Arm CCA. By leveraging the existing security features in Arm CCA, CAGE ensures data security during confidential computing on unified-memory GPUs, the mainstream accelerators in Arm devices. To adapt the GPU workflow to CCA's realm-style architecture, CAGE proposes a novel shadow task mechanism to manage confidential GPU applications flexibly. Additionally, CAGE leverages the memory isolation mechanism in Arm CCA to protect data confidentiality and integrity from the strong adversary. Based on this, CAGE also optimizes security operations in memory isolation to mitigate performance overhead. Without hardware changes, our approach uses the generic hardware security primitives in Arm CCA to defend against a privileged adversary. We present two prototypes to verify CAGE's functionality and evaluate performance, respectively. Results show that CAGE effectively provides GPU support for Arm CCA with an average of 2.45% performance overhead.

View More Papers

Facilitating Non-Intrusive In-Vivo Firmware Testing with Stateless Instrumentation

Jiameng Shi (University of Georgia), Wenqiang Li (Independent Researcher), Wenwen Wang (University of Georgia), Le Guan (University of Georgia)

Read More

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

Linkang Du (Zhejiang University), Min Chen (CISPA Helmholtz Center for Information Security), Mingyang Sun (Zhejiang University), Shouling Ji (Zhejiang University), Peng Cheng (Zhejiang University), Jiming Chen (Zhejiang University), Zhikun Zhang (CISPA Helmholtz Center for Information Security and Stanford University)

Read More

Separation is Good: A Faster Order-Fairness Byzantine Consensus

Ke Mu (Southern University of Science and Technology, China), Bo Yin (Changsha University of Science and Technology, China), Alia Asheralieva (Loughborough University, UK), Xuetao Wei (Southern University of Science and Technology, China & Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, SUSTech, China)

Read More

Connecting the Dots in the Sky: Website Fingerprinting in...

Prabhjot Singh (University of Waterloo), Diogo Barradas (University of Waterloo), Tariq Elahi (University of Edinburgh), Noura Limam (University of Waterloo)

Read More