Measurements, Attacks, and Defenses for the Web (MADWeb) Workshop 2021
Thursday, 25 February
Over the past decade, HTTPS adoption has risen dramatically. The Web PKI has shifted seismically, with browsers imposing new requirements on CAs and server operators. These shifts bring security and privacy improvements for end users, but they have often been driven by incompatible browser changes that break websites, causing frustration for end users as well as server operators. Security-positive breaking changes involve a plethora of choices. Should browsers roll out a change gradually, or rip the band-aid off and deploy it all at once? How do we advertise the change and motivate different players in the ecosystem to update configurations before they break? How do different types and amounts of breakage affect the user experience? And the meta-question: how do we approach such quandaries scientifically? Drawing from several case studies in the HTTPS ecosystem, I'll talk about the science of nudging an ecosystem: methods that the web browser community has developed, and lessons we've learned, for measuring how best to get millions of websites to improve security while minimizing the frustrations of incompatibility.
Christoph Kerschbaumer, Julian Gaibler, Arthur Edelstein (Mozilla Corporation), Thyla van der Merwey (ETH Zurich)
The number of websites that support encrypted and secure https connections has increased rapidly in recent years. Despite major gains in the proportion of websites supporting https, the web contains millions of legacy http links that point to insecure versions of websites. Worse, numerous websites often use http connections by default, even though they already support https. Establishing a connection using http rather than https has the downside that http transfers data in cleartext, granting an attacker the ability to eavesdrop, or even tamper with the transmitted data. To date, however, no web browser has attempted to remedy this problem by favouring secure connections by default.
We present HTTPS-Only, an approach which first tries to establish a secure connection to a website using https and only allows a fallback to http if a secure connection cannot be established. Our approach also silently upgrades all insecure http subresource requests (image, stylesheet, script) within a secure website to use the secure https protocol instead. Our measurements indicate that our approach can upgrade the majority of connections to https and therefore suggests that browser vendors have an opportunity to evolve their current connection model.
Shubham Agarwal (Saarland University), Ben Stock (CISPA Helmholtz Center for Information Security)
[NOTE: The authors of this paper found critical errors in their methodology after it was presented and published at the workshop and asked to withdraw the paper from the proceedings. As such, in the current version, we mark the paper as incorrect to help future research not repeating the same mistakes. We hope the authors will repeat their measurements with a fixed approach in future.]
The adoption of WebAssembly increases rapidly, as it provides a fast and safe model for program execution in the browser. However, WebAssembly is not exempt from vulnerabilities that can be exploited by malicious observers. Code diversification can mitigate some of these attacks. In this paper, we present the first fully automated workflow for the diversification of WebAssembly binaries. We present CROW, an open-source tool implementing this workflow through enumerative synthesis of diverse code snippets expressed in the LLVM intermediate representation. We evaluate CROW’s capabilities on 303 C programs and study its use on a real-life security-sensitive program: libsodium, a modern cryptographic library. Overall, CROW is able to generate diverse variants for 239 out of 303 (79%) small programs. Furthermore, our experiments show that our approach and tool is able to successfully diversify off-the-shelf cryptographic software (libsodium).
Ali Sadeghi Jahromi, AbdelRahman Abdou (Carleton University)
The Internet’s Public Key Infrastructure (PKI) has been used to provide security to HTTPS and other protocols over the Internet. Such infrastructure began to be increasingly relied upon for DNS security. DNS-over-TLS (DoT) is one recent rising and prominent example, whereby DNS traffic between stub and recursive resolver gets transmitted over a TLS-secured session. The security research community has studied and improved security shortcomings in the web certificate ecosystem. DoT’s certificates, on the other hand, have not been investigated comprehensively. It is also unclear if DoT client-side tools (e.g., stub resolvers) enforce security properly as modern-day browsers and mail clients do for HTTPS and secure email. In this research, we compare the DoT and HTTPS certificate ecosystems. Preliminary results are so far promising, as they show that DoT appears to have benefited from the PKI security advancements that were mostly tailored to HTTPS.
Engines that scan Internet-connected devices allow for fast retrieval of useful information regarding said devices, and their running services. Examples of such engines include Censys and Shodan. We present a snapshot of our in-progress effort towards the characterization and systematic evaluation of such engines, herein focusing on results obtained from an empirical study that sheds light on several aspects. These include: the freshness of a result obtained from querying Censys and Shodan, the resources they consume from the scanned devices, and several interesting operational differences between engines observed from the network edge. Preliminary results confirm that the information retrieved from both engines can reflect updates within 24 hours, which aligns with implicit usage expectations in recent literature. The results also suggest that the consumed resources appear insignificant for common Internet applications, e.g., one full application-layer connection (banner grab) per port, per day. Results so far highlight the value of such engines to the research community
Hua Wu (School of Cyber Science & Engineering and Key Laboratory of Computer Network and Information Integration Southeast University, Ministry of Education, Jiangsu Nanjing, Purple Mountain Laboratories for Network and Communication Security (Nanjing, Jiangsu)), Shuyi Guo, Guang Cheng, Xiaoyan Hu (School of Cyber Science & Engineering and Key Laboratory of Computer Network and Information Integration Southeast University, Ministry of Education, Jiangsu Nanjing)
Due to the concealment of the dark web, many criminal activities choose to be conducted on it. The use of Tor bridges further obfuscates the traffic and enhances the concealment. Current researches on Tor bridge detection have used a small amount of complete traffic, which makes their methods not very practical in the backbone network. In this paper, we proposed a method for the detection of obfs4 bridge in backbone networks. To solve current limitations, we sample traffic to reduce the amount of data and put forward the Nested Count Bloom Filter structure to process the sampled network traffic. Besides, we extract features that can be used for bridge detection after traffic sampling. The experiment uses real backbone network traffic mixed with Tor traffic for verification. The experimental result shows that when Tor traffic accounts for only 0.15% and the sampling ratio is 64:1, the F1 score of the detection result is maintained at about 0.9.
Since the dawn of the web miscreants have used this new communication medium to defraud unsuspecting users. The most common of these attacks is phishing: creating a fake login form to steal username/passwords for high-value targets such as email, social networking, or financial services. This seemingly low-skill attack still, to this day, is responsible for vast amounts of fraud and harm.
In this talk, I will cover the history of the cat-and-mouse game of phishing, touching on why, after more than a decade of research, phishing attacks are still the most common ways that end-users are directly victimized and attacked. We will discuss the advanced nature of server-side cloaking employed by phishers, as well as the PhishFarm framework which allows us to empirically measure the effect of cloaking techniques on browser-based blocking. Then, we will discuss the first end-to-end measurement of a phishing timeline: from a phishing website being deployed to credentials being used fraudulently. Finally, we'll discuss how phishers have adapted to the COVID-19 pandemic and the next generation of sophisticated phishing attacks.
Tongwei Ren (Worcester Polytechnic Institute), Alexander Wittmany (University of Kansas), Lorenzo De Carli (Worcester Polytechnic Institute), Drew Davidsony (University of Kansas)
DNS CNAME redirections, which can “steer” browser requests towards a domain different than the one in the request’s URI, are a simple and oftentimes effective means to obscure the source of a web object behind an alias. These redirections can be used to make third-party content appear as first-party content. The practice of evading browser security mechanisms through misuse of CNAMEs, referred to as CNAME cloaking, has been recently growing in popularity among advertisers/trackers to bypass blocklists and privacy policies.
While CNAME cloaking has been reported in past measurement studies, its impact on browser cookie policies has not been analyzed. We close this gap by presenting an in-depth characterization of how CNAME redirections affect cookie propagation. Our analysis uses two distinct data collection samples (June and December 2020). Beyond confirming that CNAME cloaking continues to be popular, our analysis identifies a number of websites transmitting sensitive cookies to cloaked third-parties, thus breaking browser cookie policies. Manual review of such cases identifies exfiltration of authentication cookies to advertising/tracking domains, which raises serious security concerns.
Sayak Saha Roy, Unique Karanjit, Shirin Nilizadeh (The University of Texas at Arlington)
Twitter maintains a blackbox approach for detecting malicious URLs shared on its platform. In this study, we evaluate the efficiency of their detection mechanism against newer phishing and drive-by download threats posted on the website over three different time periods of the year. Our findings indicate that several threats remained undetected by Twitter, with the majority of them originating from nine different free website hosting services. These URLs targeted 19 popular organizations and also distributed malicious files from 9 different threat categories. Moreover, the malicious websites hosted under these services were also less likely to get detected by URL scanning tools than other similar threats hosted elsewhere, and were accessible on their respective domains for a much longer duration. We believe that the aforementioned features, combined with the ease of access (drag and drop website creating interface, up-to-date SSL certification, reputed domain, etc.) provides attackers a fast and convenient way to create malicious attacks using these services. On the other hand, we also observed that the majority of the URLs which were actually detected by Twitter remained active on the platform throughout our study, allowing them to be easily distributed across the platform. Also, several benign websites in our dataset were detected by Twitter as being malicious. We hypothesize that this is caused due to a blocklisting procedure used by Twitter, which detects all URLs originating from certain domains, irrespective of their content. Thus, our results identify a family of potent threats, which are distributed freely on Twitter, and are also not detected by the majority of URL scanning tools, or even the services which host them, thus making the need for a more thorough URL blocking approach from Twitter’s end more apparent.
YouTube has become the second most popular website according to Alexa, and it represents an enticing platform for scammers to attract victims. Because of the computational difficulty of classifying multimedia, identifying scams on YouTube is more difficult than text-based media. As a consequence, the research community to-date has provided little insight into the prevalence, lifetime, and operational patterns of scammers on YouTube. In this short paper, we present a preliminary exploration of scam videos on YouTube. We begin by identifying 74 search queries likely to lead to scam videos based on the authors’ experience seeing scams during routine browsing. We then manually review and characterize the results to identify 668 scams in 3,700 videos. In a detailed analysis of our classifications and metadata, we find that these scam videos have a median lifetime of nearly nine months, and many rely on external websites for monetization. We also explore the potential of detecting scams from metadata alone, finding that metadata does not have enough predictive power to distinguish scams from legitimate videos. Our work demonstrates that scams are a real problem for YouTube users, motivating future work on this topic.