Select-Then-Compute: Encrypted Label Selection and Analytics over Distributed Datasets using FHE

Nirajan Koirala (University of Notre Dame), Seunghun Paik (Hanyang University), Sam Martin (University of Notre Dame), Helena Berens (University of Notre Dame), Tasha Januszewicz (University of Notre Dame), Jonathan Takeshita (Old Dominion University), Jae Hong Seo (Hanyang University), Taeho Jung (University of Notre Dame)

Private Set Intersection (PSI) protocols allow a querier to determine whether an item exists in a dataset without revealing the query or exposing non-matching records. It has many applications in fraud detection, compliance monitoring, healthcare analytics, and secure collaboration across distributed data sources. In these cases, the results obtained through PSI can be sensitive and even require some kind of downstream computation on the associated data before the outcome is revealed to the querier, computation that may involve floating-point arithmetic, such as the inference of a machine learning model. Although many such protocols have been proposed, and some of them even enable secure queries over distributed encrypted sets, they fail to address the aforementioned real-world complexities.

In this work, we present the first encrypted label selection and analytics protocol construction, which allows the querier to securely retrieve not just the results of intersections among identifiers but also the outcomes of downstream functions on the data/label associated with the intersected identifiers. To achieve this, we construct a novel protocol based on an approximate CKKSfully homomorphic encryption that supports efficient label retrieval and downstream computations over real-valued data. In addition, we introduce several techniques to handle identifiers in large domains, e.g., 64 or 128 bits, while ensuring high precision for accurate downstream computations.

Finally, we implement and benchmark our protocol, compare it against state-of-the-art methods, and perform evaluation over real-world fraud datasets, demonstrating its scalability and efficiency in large-scale use case scenarios. Our results show up to 1.4× to 6.8× speedup over prior approaches and select and analyze encrypted labels over real-world datasets in under 65 sec., making our protocol practical for real-world deployments.

Paper

Slides

Video

Select-Then-Compute: Encrypted Label Selection and Analytics over Distributed Datasets using FHE

View More Papers

Actively Understanding the Dynamics and Risks of the Threat...

CHAMELEOSCAN: Demystifying and Detecting iOS Chameleon Apps via LLM-Powered...

Building Next-Generation Datasets for Provenance-Based Intrusion Detection