Andes Y. L. Kei (Chinese University of Hong Kong), Sherman S. M. Chow (Chinese University of Hong Kong)

Adoption of transformer-based machine learning models is growing, raising concerns about sensitive data exposure. Nonetheless, current secure inference solutions incur substantial overhead due to their extensive reliance on non-linear protocols, such as softmax and Gaussian error linear unit (GELU). Driven by numerical stability needs, softmax approximations (e.g., NeurIPS 2021) typically extract the maximum element of an input vector, incurring logarithmic rounds (in the input length). Existing GELU protocols (e.g., S&P 2024) use piecewise approximations with high-degree polynomials that rely heavily on secure multiplications and comparisons, which are expensive. Such complexities also hinder model owners who are not familiar with cryptography from easily deploying their custom models.

SHAFT, our proposed system, provides a secure, handy, accurate, and fast transformer inference framework for deployment. Highlights of our contributions include 1) the first constant-round softmax protocol for transformers, uniquely combining the benefits of input clipping and characteristics of ordinary differential equations, and 2) a highly accurate GELU protocol on a novel characterization designed for Fourier series approximation. Extending to broader contexts, our new protocols also apply to general neural networks using softmax as the final layer and to transformer architectures with different activation functions. Remarkably, SHAFT outperforms state-of-the-art SIGMA (PETS 2024), based on secret sharing, and BumbleBee (NDSS 2025), which additionally uses RLWE-based homomorphic encryption. More specifically, SHAFT minimizes communication by 25-41%. and matches SIGMA's running time while surpassing BumbleBee in running time by 4.6-5.3× on LANs and 2.9-4.4× on WANs. Alongside these improvements, SHAFT attains accuracy comparable to plaintext, confirming its numerical stability and accuracy. Next in this progression, SHAFT provides an accessible open-source framework for secure and handy deployment by smoothly integrating with the Hugging Face library (EMNLP Demos 2020).

View More Papers

Secure Transformer Inference Made Non-interactive

Jiawen Zhang (Zhejiang University), Xinpeng Yang (Zhejiang University), Lipeng He (University of Waterloo), Kejia Chen (Zhejiang University), Wen-jie Lu (Zhejiang University), Yinghao Wang (Zhejiang University), Xiaoyang Hou (Zhejiang University), Jian Liu (Zhejiang University), Kui Ren (Zhejiang University), Xiaohu Yang (Zhejiang University)

Read More

Tweezers: A Framework for Security Event Detection via Event...

Jian Cui (Indiana University), Hanna Kim (KAIST), Eugene Jang (S2W Inc.), Dayeon Yim (S2W Inc.), Kicheol Kim (S2W Inc.), Yongjae Lee (S2W Inc.), Jin-Woo Chung (S2W Inc.), Seungwon Shin (KAIST), Xiaojing Liao (Indiana University)

Read More

Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language...

Aydin Abadi (Newcastle University), Vishnu Asutosh Dasu (Pennsylvania State University), Sumanta Sarkar (University of Warwick)

Read More

VeriBin: Adaptive Verification of Patches at the Binary Level

Hongwei Wu (Purdue University), Jianliang Wu (Simon Fraser University), Ruoyu Wu (Purdue University), Ayushi Sharma (Purdue University), Aravind Machiry (Purdue University), Antonio Bianchi (Purdue University)

Read More