Nirajan Koirala (University of Notre Dame), Seunghun Paik (Hanyang University), Sam Martin (University of Notre Dame), Helena Berens (University of Notre Dame), Tasha Januszewicz (University of Notre Dame), Jonathan Takeshita (Old Dominion University), Jae Hong Seo (Hanyang University), Taeho Jung (University of Notre Dame)
Private Set Intersection (PSI) protocols allow a querier to determine whether an item exists in a dataset without revealing the query or exposing non-matching records.
It has many applications in fraud detection, compliance monitoring, healthcare analytics, and secure collaboration across distributed data sources.
In these cases, the results obtained through PSI can be sensitive and even require some kind of downstream computation on the associated data before the outcome is revealed to the querier, computation that may involve floating-point arithmetic, such as the inference of a machine learning model.
Although many such protocols have been proposed, and some of them even enable secure queries over distributed encrypted sets, they fail to address the aforementioned real-world complexities.
In this work, we present the first textit{encrypted label selection and analytics} protocol construction, which allows the querier to securely retrieve not just the results of intersections among identifiers but also the outcomes of downstream functions on the data/label associated with the intersected identifiers.
To achieve this, we construct a novel protocol based on an approximate CKKS fully homomorphic encryption that supports efficient label retrieval and downstream computations over real-valued data.
In addition, we introduce several techniques to handle identifiers in large domains, e.g., 64 or 128 bits, while ensuring high precision for accurate downstream computations.
Finally, we implement and benchmark our protocol, compare it against state-of-the-art methods, and perform evaluation over real-world fraud datasets, demonstrating its scalability and efficiency in large-scale use case scenarios.
Our results show up to 1.4$times$ to 6.8$times$ speedup over prior approaches and select and analyze encrypted labels over real-world datasets in under 65 sec., making our protocol practical for real-world deployments.