STIP: Three-Party Privacy-Preserving and Lossless Inference for Large Transformers in Production

Mu Yuan (The Chinese University of Hong Kong), Lan Zhang (University of Science and Technology of China), Yihang Cheng (University of Science and Technology of China), Miao-Hui Song (University of Science and Technology of China), Guoliang Xing (The Chinese University of Hong Kong), Xiang-Yang Li (University of Science and Technology of China)

The privacy of model parameters and user data is crucial for Transformer-based cloud services, such as online chatbots. While recent advances in secure multi-party computation and homomorphic encryption provide strong cryptographic guarantees, their computational overhead makes them infeasible for real-time inference with large-scale Transformer models.

In this work, we propose a practical alternative that balances privacy and efficiency in real-world deployments.
We introduce a three-party threat model involving a model developer, a cloud model server, and a data owner, capturing the trust assumptions and deployment conditions of practical AI services.
Within this framework, we design a semi-symmetric permutation-based protection mechanism and present STIP, the first three-party privacy-preserving inference system for large Transformers deployable on commodity hardware.
STIP formally bounds privacy leakage while preserving lossless inference accuracy.
To further safeguard model parameters, STIP integrates trusted execution environments to resist model extraction and fine-tuning attacks.

We evaluate STIP on six representative Transformer model families, including models with up to 70 billion parameters, under three deployment settings.
STIP's efficiency is comparable to unprotected full-cloud inference, for example, STIP achieves 31.7 ms latency on LLaMA2-7B model.
STIP also shows effective resistance to various attacks against user data and model parameters.
STIP has been deployed in a production environment on our proprietary 70B model.
In a three-month online test, STIP brings only 12% additional latency and no privacy incidents were reported, demonstrating its practicality and robustness for production-scale AI systems.

Paper

View More Papers

Scalable Off-Chain Auctions

Mohsen Minaei (Visa Research), Ranjit Kumaresan (Visa Research), Andrew Beams (Visa Research), Pedro Moreno-Sanchez (IMDEA Software Institute, MPI-SP), Yibin Yang (Georgia Institute of Technology), Srinivasan Raghuraman (Visa Research and MIT), Panagiotis Chatzigiannis (Visa Research), Mahdi Zamani (Visa Research), Duc V. Le (Visa Research)

UsersFirst in Practice: Evaluating a User-Centric Threat Modeling Taxonomy...

Alexandra Xinran Li (Carnegie Mellon University), Tian Wang (University of Illinois Urbana-Champaign), Yu-Ju Yang (University of Illinois Urbana-Champaign), Miguel Rivera-Lanas (Carnegie Mellon University), Debeshi Ghosh (Carnegie Mellon University), Hana Habib (Carnegie Mellon University), Lorrie Cranor (Carnegie Mellon University), Norman Sadeh (Carnegie Mellon University)

Small Cell, Big Risk: A Security Assessment of 4G...

Yaru Yang (Tsinghua University), Yiming Zhang (Tsinghua University), Tao Wan (CableLabs & Carleton University), Haixin Duan (Tsinghua University & Quancheng Laboratory), Deliang Chang (QI-ANXIN Technology Research Institute), Yishen Li (Tsinghua University), Shujun Tang (Tsinghua University & QI-ANXIN Technology Research Institute)