On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge

Author(s): Shouling Ji, Weiqing Li, Neil Zhenqiang Gong, Prateek Mittal, Raheem Beyah

Download: Paper (PDF)

Date: 7 Feb 2015

Document Type: Briefing Papers

Additional Documents: Slides

Associated Event: NDSS Symposium 2015

Abstract:

In this paper, we conduct the first comprehensive quantification on the perfect de-anonymizability and partial de-anonymizability of real world social networks with seed information in general scenarios, where a social network can follow an arbitrary distribution model. This quantification provides the theoretical foundation for existing structure based de-anonymization attacks (e.g., \cite{bacdwowww07}\cite{narshmsp09}\cite{srihicccs12}) and closes the gap between de-anonymization practice and theory. Besides that, our quantification can serve as a testing-stone for the effectiveness of anonymization techniques, i.e., researchers can employ our quantified structural conditions to evaluate the potential de-anonymizability of the anonymized social networks. Based on our quantification, we conduct a large scale evaluation on the de-anonymizability of 24 various real world social networks by quantitatively showing: 1) the conditions for perfect and (1-\epsilon) de-anonymization of a social network, where \epsilon specifies the tolerated de-anonymization error, and 2) how many users of a social network can be successfully de-anonymized. Furthermore, we show that, both theoretically and experimentally, the overall structural information based de-anonymization attack is much more powerful than the seed knowledge-only based de-anonymization attack, and even without any seed information, a social network can be perfectly or partially de-anonymized. Finally, we discuss the implications of this work. Our findings are expected to shed light on the future research of the structural data anonymization and de-anonymization area, and to help data owners evaluate their structural data vulnerability before data sharing and publishing.