When a domain is registered, information about the registrants and other related personnel is recorded by WHOIS databases owned by registrars or registries (called WHOIS providers jointly), which are open to public inquiries. However, due to the enforcement of the European Union’s General Data Protection Regulation (GDPR), certain WHOIS data (i.e., the records about EEA, or the European Economic Area, registrants) needs to be redacted before being released to the public. Anecdotally, it was reported that actions have been taken by some WHOIS providers. Yet, so far there is no systematic study to quantify the changes made by the WHOIS providers in response to the GDPR, their strategies for data redaction and impact on other applications relying on WHOIS data.
In this study, we report the first large-scale measurement study to answer these questions, in hopes of guiding the enforcement of the GDPR and identifying pitfalls during compliance. This study is made possible by analyzing a collection of 1.2 billion WHOIS records spanning two years. To automate the analysis tasks, we build a new system GCChecker based on unsupervised learning, which assigns a compliance score to a provider. Our findings of WHOIS GDPR compliance are multi-fold. To highlight a few, we discover that the GDPR has a profound impact on WHOIS, with over 85% surveyed large WHOIS providers redacting EEA records at scale. Surprisingly, over 60% large WHOIS data providers also redact non-EEA records. A variety of compliance flaws like incomplete redaction are also identified. The impact on security applications is prominent and redesign might be needed. We believe different communities (security, domain and legal) should work together to solve the issues for better WHOIS privacy and utility.