Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb) 2026 Program
Friday, 27 February
-
Oleksii (Alex) Starov (Palo Alto Networks)
Phishing and scams continue to dominate the Web threat landscape. As attackers adopt AI to automate their operations, we are seeing an increasingly diverse range of lures and evasion techniques on phishing web pages. To counter this, security solutions have to deploy AI-ready defenses designed to detect social engineering content and overcome advanced cloaking.
Drawing on nearly a decade of industry experience, this keynote explores the AI-driven evolution of phishing. We will investigate novel attacks developed in our research that demonstrate how Generative AI can obfuscate malicious code and how LLMs can assemble phishing pages in real-time. Because these "runtime assembly" methods can evade traditional network filters, the browser serves as a critical vantage point for detection. We will conclude by discussing a twofold defense strategy: building robust AI-powered detectors and leveraging the browser as a definitive vantage point for protection against patient-zero phishing threats.
Speaker's Biography: Oleksii Starov, Ph.D., is a Security Scientist and the Senior Research Manager for Web Security at Palo Alto Networks. He focuses on protecting users against evolving online threats by developing proactive, data-driven network and browser security solutions. Prior to joining Palo Alto Networks in 2018, Oleksii was a member of the PragSec Lab at Stony Brook University, conducting research in web security and privacy. An active contributor to the academic community, he has published and co-authored over 20 papers and serves on the program committees of top-tier security conferences. Since 2020, Oleksii has supported the MADWeb workshop and currently serves on its steering committee. Oleksii frequently shares his research through the Palo Alto Networks Unit 42 blogs and co-hosts the No Name Podcast, a leading cybersecurity podcast in Ukraine.
-
Antoine Boutet (Inria), Tao Beaufils (Inria)
Content personalization is ubiquitous on the web and mobile applications. However, the mechanisms that practically control this personalization by the different parties in the targeted advertising ecosystem remain unclear, raising serious questions about possible user manipulations to encourage them to take certain actions (e.g., consent to cookies, purchase a product). Due to its user-centric nature, it is technically difficult to collect this personalization in order to analyze it on a large scale. In this paper, we present STETOSCOPE (underSTand targETing and manipulatiOnS via COllaborative Private data collEction), a participative mobile application to analyze content personalization. Instead of relying on bots for data collection (which are subject to detection by platforms and may induce bias in the content), STETOSCOPE engages individuals by providing them with data collection campaigns linked to legitimate questions posed by citizens (e.g., is there price discrimination on this platform? Is this incentive message trustworthy?). A data collection campaign guides the user to specific web pages or mobile applications where a screenshot is triggered by the participant to collect the targeted information. These screenshots are then analyzed on a backend server to draw conclusions. This participatory application allows users to be involved in issues related to different forms of personalization on mobile, such as the analysis of dark patterns, price or search discrimination, the exchange of personal information with third parties, trust in incentive messages, or information bubbles for instance. To assess the prospects and limitations of the STETOSCOPE, we conducted preliminary data collection campaigns on e-commerce, bus and hotel booking, and recruitment platforms. Our preliminary results show evidence of search discrimination on most platforms, evidence of price discrimination on AliExpress, and evidence of fake discounts during Black Friday on Temu and on many ecommerce platforms before and after Christmas.
-
Shaoqi Jiang (Concordia University), Mohammad Mannan (Concordia University)
The availability of modern proxy tools has enabled more detailed analysis of application-layer traffic. Existing research has shown that open-source tools like mitmproxy are effective in observing the inner content of on-the-fly traffic, especially in HTTP and HTTPS requests. However, with HTTP/3 being increasingly adopted in both apps and web services, new challenges are posed because QUIC, the foundational protocol for HTTP/3, lacks full support in widely-used open-source mitm-proxy versions, which significantly hinders comprehensive research on HTTP/3 traffic within mobile applications. To address this limitation, we develop QuicMitm, a specialized QUIC man-in-the-middle proxy. Our proxy can observe plaintext HTTP/3 based on QUIC and handle HTTP requests from Android mobile apps. Using QuicMitm, we tested 3,452 popular apps to observe their HTTP/3 traffic. Also, we compared the privacy-related information leakage of HTTP/3 and HTTP/2 in these apps. Our observations provide a glance of the real-world prevalence of QUIC usage across mobile applications. We hope that our tool can assist researchers in conducting large-scale, dedicated measurements and analysis of QUIC-transmitted content.
-
Daan Vansteenhuyse (DistriNet, KU Leuven), Hadji Musaev (DistriNet, KU Leuven), Lieven Desmet (DistriNet, KU Leuven)
Cybercriminals increasingly exploit the web, targeting millions of users and causing substantial financial losses. To combat these online scams, industry and academia have created databases consisting of malicious websites. By analyzing its properties, various detection mechanisms have been proposed to automatically identify fraudulent activity on the web. Although proven useful, these databases are curated, focus on the global perspective and lack insights of benign websites perceived as malicious by users. In this paper, we analyze user-reported scams from an anti-scam initiative, deployed in a European country, using topic modeling to uncover regional trends and user perceptions. Our findings inform the design of localized anti-cybercrime datasets and detection strategies.
Based on an initial manual analysis, we find most reported malicious activity takes place in the form of dating scams while a main portion of the dataset contains benign newsletters indicating the varying accuracy of user reports. Using BERTopic to extend the manual analysis, we show how it can be used to study the evolution of campaigns over time. We combine our insights into advice that can be used by anti-cybercrime organizations to set up similar datasets and describe how tools, such as topic modeling, can further aid both industry partners, to harden their anti-phishing defenses, and research institutions, to better study regional and psychological aspects associated with online fraud.
-
Ron Amsalem (Ariel University), Harel Berger (Ariel University)
Phishing attacks remain a widespread and persistent security threat, increasingly targeting academic institutions and university researchers. Because researchers often publish their contact information online, their email addresses become easy targets for automated harvesting systems. To reduce this risk, many university researchers employ basic obfuscation techniques such as replacing symbols with words (e.g., “name at domain dot com”) to prevent automated tools from identifying their addresses. This study examines whether modern large language models can infer or reconstruct researchers’ true email addresses despite such obfuscation. In particular, we evaluate three widely used models, ChatGPT, Gemini, and Claude, on their ability to extract contact information from webpages of security researchers publishing in leading venues. Our results show that the models differ substantially in their ability to recover obfuscated emails, exhibiting inconsistencies and blind spots. Our evaluation further shows that Gemini performs best (74% correct), followed by ChatGPT (60%) and Claude (40%). We additionally analyze the specific error patterns and points of disagreement across the models.
-
Lucas Stephens (Oregon State University), Jacob Porter (Oregon State University), Zane Ma (Oregon State University)
-
Tanya Prasad (University of British Columbia), Rut Vora (University of British Columbia), Soo Yee Lim (University of British Columbia), Nguyen Phong Hoang (University of British Columbia), Thomas Pasquier (University of British Columbia)
Third-party advertising and tracking (A&T) are pervasive across the web, yet user exposure varies significantly with browser choice, browsing location, and hosting jurisdiction. We systematically study how these three factors shape tracking by conducting synchronized crawls of 743 popular websites from 8 geographic vantage points using 4 browsers and 2 consent states. Our analysis reveals that browser choice, user location, and hosting jurisdiction each shape tracking exposure in distinct ways. Privacy-focused browsers block more third-party trackers, reducing observed A&T domains by up to 30% in permissive regulatory environments, but offer smaller relative gains in stricter regions. User location influences the tracking volume, the prevalence of consent banners, and the extent of cross-border tracking: GDPR-regulated locations exhibit about 80% fewer third-party A&T domains before consent and keep 89–91% of A&T requests within the EEA or adequacy countries. Hosting jurisdiction plays a smaller role; tracking exposure varies most strongly with inferred user location rather than where sites are hosted. These findings underscore both the power and limitations of user agency, informing the design of privacy tools, regulatory enforcement strategies, and future measurement methodologies.
-
Scott Seidenberger (University of Oklahoma), Marc Beret (University of Oklahoma), Raveen Wijewickrama (University of Texas at San Antonio), Murtuza Jadliwala (University of Texas at San Antonio), Anindya Maiti (University of Oklahoma)
We introduce NinjaDoH, a novel DNS over HTTPS (DoH) protocol that leverages the InterPlanetary Name System (IPNS), along with public cloud infrastructure, to create a censorship-resistant moving target DoH service. NinjaDoH is specifically designed to evade traditional censorship methods that involve blocking DoH servers by IP addresses or domains by continually altering the server’s network identifiers, significantly increasing the complexity of effectively censoring NinjaDoH traffic without disruption of other web traffic. We also present an analysis that quantifies the DNS query latency and financial costs of running our implementation of this protocol as a service. Further tests assess the ability of NinjaDoH to elude detection mechanisms, including both commercial firewall products and advanced machine learning-based detection systems. The results broadly support NinjaDoH’s efficacy as a robust, moving target DNS solution that can ensure continuous and secure internet access in environments with heavy DNS-based censorship.
-
Tobias Länge (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Fabian Lucas Ballreich (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Anne Hennig (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Peter Mayer (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Melanie Volkamer (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany)
Email spoofing, the practice of sending illegitimate messages that appear to come from a legitimate sender, is a phishing technique frequently employed by attackers. In an effort to prevent such phishing, anti-spoofing mechanisms like DMARC were introduced and have been examined in the research community with respect to describing adoption rates, policies used, and potential problems. However, prior research has not yet taken into account all aspects of DMARC when evaluating how effectively configurations prevent spoofing attacks. To address this research gap, we developed a utility-oriented configuration matrix – focusing on the anti-spoofing effectiveness of different DMARC configurations – and provide clear recommendations for selecting the appropriate configuration. We then collected data from the Tranco Top-100k list daily for a duration of eight months and applied our classification to the collected data. Our analyses of the collected data reveals how configurations evolve over time and provides insights into the actual deployment of DMARC in practice. This allows us to identify potential issues that hinder the adoption of more secure configurations and to identify the most common errors in invalid DMARC records found in the wild, which could serve as a basis for enhancing the DMARC standard. Our results show that domains move towards configurations that are more effective against email spoofing, however, still exhibiting a lack of knowledge with respect to different policy settings.
-
Janos Szurdi (Palo Alto Networks), Reethika Ramesh (Palo Alto Networks), Ram Sundara Raman (University of California Santa Cruz), Daiping Liu (Palo Alto Networks)
Over the past decade, ICANN’s New gTLD Program has dramatically expanded the DNS namespace, raising persistent concerns about its security implications as another round of applications approaches in 2026. In this paper, we present a large-scale, longitudinal study of both malicious and benign domain usage across four generations of gTLDs—legacy, first-wave, second-wave, and third-wave—alongside country-code TLDs. Using four years of longitudinal data from 2021 to 2025, collected from multiple sources including zone files, active DNS measurements, passive DNS feeds, and domain categorizations from a leading global cybersecurity vendor, we develop three reputation metrics to capture utilization trends: the malicious ratio, the malicious-to-benign ratio, and the non-benign ratio.
Our analysis shows that newer gTLD generations are substantially more malicious and significantly less utilized for benign purposes than legacy TLDs. Compared to legacy gTLDs, newer generations exhibit malicious-to-benign ratios that are 3.1–9.2× worse, with these ratios worsening rapidly over time: up to 50× growth in malicious-to-benign ratios within four years for the newest gTLDs. We examine contributing factors to show that lower pricing, higher popularity, and certain TLD categories are strongly associated with worse reputation, while defensive registrations account for only a negligible fraction of domain registrations. Finally, we identify a small number of sponsoring organizations that disproportionately operate gTLDs with severe abuse. Our results underscore the need for continued scrutiny and rigorous evaluation of new gTLDs.
-
Tino Hager (Mailtower.app), Ronald Petrlic (Nuremberg Institute of Technology)
The widespread enforcement of email authentication mechanisms such as SPF, DKIM, and DMARC by major email providers has become a cornerstone in the fight against email spoofing. However, since these policies have been rigorously checked in practice, a paradoxical problem has emerged: emails that are correctly authenticated and fully compliant with all policies are nevertheless rejected. In particular, temp errors appear to occur arbitrarily and can account for substantial email delivery failures. To date, no systematic explanation for this phenomenon has been provided.
In this paper, we present the first comprehensive study that shows that these errors are not caused by the authentication mechanisms themselves, but by limitations and failures in the underlying DNS infrastructure. Our measurements reveal that the DNS zones of some—especially large—organizations are overcrowded with TXT records used for domain verification. We show that the resulting number and size of DNS records can directly interfere with SPF evaluation, leading to rejected emails. Furthermore, we identify issues in the DNS infrastructure of Amazon Web Services, where oversized DNS responses can trigger errors and, consequently, render emails undeliverable.
Beyond SPF, we show that DKIM configurations also contribute to delivery failures: RSA key lengths exceeding 2000 bits—despite being considered state of the art—can already result in non-delivery due to excessively large DNS responses. Finally, we are the first to uncover that Microsoft’s Exchange Online infrastructure exhibits shortcomings in handling long DNS responses, which explains a significant number of email delivery failures, particularly for large enterprises with extensive DNS configurations.
Overall, our findings provide a new perspective on the reliability of modern email authentication and demonstrate that DNS scalability and implementation limitations represent a critical, yet previously overlooked, root cause of authentication-related email delivery failures.
-
Muhammad Muzammil (Stony Brook University), Zafir Ansari (Infoblox), Nick Nikiforakis (Stony Brook University), Darin Johnson (Infoblox)
The Domain Name System (DNS) is a critical component of the Internet, yet its foundational processes, such as domain registration and ownership changes, are generally opaque to end users. This lack of transparency enables adversaries to re-register expired domains and host malicious content that continues to receive traffic from users who trust and revisit the domain. In this paper, we introduce EchoLoc, a scalable system for detecting malicious re-registered domains across the entire TLD space that appear in live DNS resolution telemetry from Infoblox, a major DNS resolution and threat intelligence provider. We deploy EchoLoc for a one-month period, during which it analyzed 144.6M new domain registrations and identified 1.5M re-registrations, of which 66K were queried by customers. Using a machine learning-based website classification pipeline that combines structural features from web content with semantic signals derived from a large language model, we identify over 9K malicious re-registered domains. The classifier achieves 0.95 precision and recall for malicious domain detection, with an overall accuracy of 98.1%. Our analysis further shows that these domains exhibit user activity both prior to expiration and after re-registration.
-
Brian Grinstead (Mozilla Corporation), Christoph Kerschbaumer (Mozilla Corporation), Mariana Meireles (Independent), Cameron Allen (UC Berkeley)
-
Sivakanesan Dhanushkanda (Old Dominion University), Mustafa Ibrahim (Old Dominion University), Shuai Hao (Old Dominion University)
-
Modern browsers are massive, notoriously complex systems. We use them for everything. Unfortunately, they're also largely written in C and C++, and thus as useful to attackers as they are to us. Indeed, few systems are as widely exploited in the wild—to target everyone from ethnic groups to journalists and activists—as browsers. In this talk I'm going to give you an overview of our efforts using programming language techniques—from information flow type systems, to WebAssembly-based sandboxing, and automated verification—to shift the design and implementation of Firefox towards a more secure browser.
Speaker's Biography: Deian is an Associate Professor of Computer Science and Engineering at UC San Diego, where he co-leads the Security and Programming Systems groups. His research lies at the intersection of security and programming languages; he is particularly interested in building secure systems that are deployed in production. He is a co-founder of Cubist, a security and infrastructure digital assets platform, and a board director of the Bytecode Alliance. Previously he was a co-founder of Intrinsic, a runtime security startup acquired by VMware in 2019.
-
Alexandru Bara (University of Waterloo), Aswad Tariq (University of Waterloo), Urs Hengartner (University of Waterloo)
Behavioural biometrics have emerged as a transformative security mechanism for the web, leveraging user interaction patterns like keystrokes and mouse movements for authentication. Detecting scripts that perform behavioural biometrics at scale remains challenging due to code obfuscation, dynamic execution, and overlap with analytics scripts. We aim to get an understanding of how widely deployed such scripts are by crawling more than 20K websites, including the Tranco Top 20K list, 500 bank websites, and more than 1K e-commerce websites. Our crawlers can locate checkout and login webpages where sensitive information is entered, making these websites more likely to deploy behavioural biometrics. We develop the first opensource crawler to navigate an e-commerce website to its checkout page, achieving 78% accuracy on Shopify-based websites. Our crawlers rely on a dynamic taint analysis-aware web browser to find websites that use scripts to access keystroke or mouse information and that extract this information to backend servers. We also build a ground truth dataset of behavioural biometrics scripts and create a machine learning pipeline to automatically filter out scripts that show no behavioural biometrics characteristics. Our analysis reveals that behavioural biometrics scripts are deployed on at least 0.31% and potentially up to 0.50% of the Tranco Top 20K websites, with significantly higher adoption on bank login pages. We conclude with recommendations to balance security benefits with privacy risks, advocating for transparency, deobfuscation, and regulatory oversight.
-
Abdullah Hassan Chaudhry (CISPA Helmholtz Center for Information Security), Valentino Dalla Valle (CISPA Helmholtz Center for Information Security), Aurore Fass (Inria Centre at Université Côte d’Azur)
Browser extension stores operate independently of each other and each have their own governance structure, creating a situation where threats identified on one platform can persist on others. We present the first cross-store analysis of security inconsistencies between the Chrome Web Store (CWS) and Edge Add-ons Store (EAS). We study extensions published on both stores, and discover 11 malicious extensions (affecting almost 134k users) that were present on the EAS, despite having already been removed from the CWS for containing malware. These extensions persisted on Edge for an average of 551 days (1.5 years) after their Chrome counterparts were removed for malware, with some even receiving updates during this period.
We additionally find that malicious extensions change their names and developer names more often than other extensions and that these changes are larger. We also examine extensions that have been reinstated after having been removed (e.g., for containing malware), revealing inconsistencies in extension store governance. These findings show that malicious actors can exploit the lack of coordination in an interconnected extension ecosystem.
-
Dominic Troppmann (CISPA Helmholtz Center for Information Security), Cristian-Alexandru Staicu (Endor Labs), Aurore Fass (Inria Centre at Université Côte d’Azur)
-
Shun Kashiwa (UC San Diego), Michael Coblenz (UC San Diego), Deian Stefan (UC San Diego)