NDSS Symposium 2026 Program
Monday, 23 February
Tuesday, 24 February
Chair's Welcome and Opening Remarks
The Fast and the Curious Packets
-
Zixuan Liu (Tsinghua University), Yi Zhao (Beijing Institute of Technology), Zhuotao Liu (Tsinghua University), Qi Li (Tsinghua University), Chuanpu Fu (Tsinghua University), Guangmeng Zhou (Tsinghua University), Ke Xu (Tsinghua University)
Machine Learning (ML)-based malicious traffic detection is a promising security paradigm. It outperforms rule-based traditional detection by identifying various advanced attacks. However, the robustness of these ML models is largely unexplored, thereby allowing attackers to craft adversarial traffic examples that evade detection. Existing evasion attacks typically rely on overly restrictive conditions (e.g., encrypted protocols, Tor, or specialized setups), or require detailed prior knowledge of the target (e.g., training data and model parameters), which is impractical in realistic black-box scenarios. The feasibility of a hard-label black-box evasion attack (i.e., applicable across diverse tasks and protocols without internal target insights) thus remains an open challenge.
To this end, we develop NetMasquerade, which leverages reinforcement learning (RL) to manipulate attack flows to mimic benign traffic and evade detection. Specifically, we establish a tailored pre-trained model called Traffic-BERT, utilizing a network-specialized tokenizer and an attention mechanism to extract diverse benign traffic patterns. Subsequently, we integrate Traffic-BERT into the RL framework, allowing NetMasquerade to effectively manipulate malicious packet sequences based on benign traffic patterns with minimal modifications. Experimental results demonstrate that NetMasquerade enables both brute-force and stealthy attacks to evade 6 existing detection methods under 80 attack scenarios, achieving over 96.65% attack success rate. Notably, it can evade the methods that are either empirically or certifiably robust against existing evasion attacks. Finally, NetMasquerade achieves low-latency adversarial traffic generation, demonstrating its practicality in real-world scenarios.
-
Xinhao Deng (Tsinghua University & Ant Group), Yixiang Zhang (Tsinghua University), Qi Li (Tsinghua University & Zhongguancun Laboratory), Zhuotao Liu (Tsinghua University & Zhongguancun Laboratory), Yabo Wang (Tsinghua University), Ke Xu (Tsinghua University & Zhongguancun Laboratory)
Anonymous communication systems, e.g., Tor, are vulnerable to various website fingerprinting (WF) attacks, which analyze network traffic patterns to compromise user privacy. In particular, sophisticated attacks employ deep learning (DL) models to identify distinctive traffic patterns associated with specific websites, allowing the adversary to determine which websites users have visited. However, these attacks are not designed to handle traffic drift, such as changes in website content and network conditions. Since traffic drift is common in real-world, the effectiveness of these attacks diminishes significantly in real-world deployment. To address this limitation, we develop Proteus, the first adaptive WF attack framework to effectively mitigate the impact of traffic drift while maintaining robust performance in real-world scenarios. The key design rationale of Proteus is to continuously fine-tune the WF model using only drifted traffic without ground-truth labels collected while deploying the model, enabling the model to adapt to complex traffic drift in near real time. Specifically, Proteus aligns the feature distributions of original and drifted traffic by minimizing the maximum mean discrepancy and thus enhances model confidence by optimizing the entropy distribution of its predictions. Furthermore, it utilizes a Gaussian mixture model to obtain reliable pseudo labels, which are subsequently used in supervised fine-tuning to further enhance its robustness against drifted traffic. Notably, Proteus can be seamlessly integrated with existing DL-based WF attacks to enhance their resilience to traffic drift. We evaluate Proteus on large-scale datasets containing over 350,000 real-world Tor browsing traces across six traffic drift scenarios. The results demonstrate that Proteus achieves an average 94.24% relative improvement in F1-score over eight state-of-the-art WF attacks for identifying drifted traffic.
-
Junchen Pan (Tsinghua University), Lei Zhang (Zhongguancun Laboratory), Xiaoyong Si (Tencent Technology (Shenzhen) Company Limited), Jie Zhang (Tsinghua University), Xinggong Zhang (Peking University), Yong Cui (Tsinghua University)
Carpet bombing attack, a growingly prevalent variant of Distributed Denial of Service (DDoS), floods multiple servers in the victim network simultaneously, minimizing per-flow malicious traffic throughput to evade detection. The aggregated malicious traffic overwhelms network access points (e.g., gateways), causing a denial of service. Moreover, advanced attackers employ application-layer attack methods to generate malicious traffic inconspicuous in both semantic and traffic volume, failing existing DDoS detection mechanisms. We propose NetRadar, a DDoS detector that achieves accurate and robust carpet bombing detection. Leveraging a server-gateway cooperation architecture, NetRadar aggregates both traffic and server-side features collected across the victim network and performs cross-server analysis to locate victim servers. To enable server-assisted carpet bombing detection, a general server-side feature set compatible with diverse services is introduced, alongside a robust model training method designed to handle runtime feature mismatch issues. Furthermore, an efficient cross-server inbound traffic analysis method is proposed to effectively exploit the similarity of carpet bombing traffic while reducing computational overhead. Evaluations on real-world and simulated datasets demonstrate that NetRadar achieves better detection performance than state-of-the-art solutions, achieving over 94% accuracy in all carpet bombing detection scenarios.
-
Ronghua Li (The Hong Kong Polytechnic University), Shinan Liu (The University of Hong Kong), Haibo Hu (The Hong Kong Polytechnic University, PolyU Research Centre for Privacy and Security Technologies in Future Smart Systems), Qingqing Ye (The Hong Kong Polytechnic University), Nick Feamster (University of Chicago)
IoT environments such as smart homes are susceptible to privacy inference attacks, where attackers can analyze patterns of encrypted network traffic to infer the state of devices and even the activities of people. While most existing attacks exploit ML techniques for discovering such traffic patterns, they underperform on wireless traffic, especially Wi-Fi, due to its heavy noisiness and the packet loss of wireless sniffing. In addition, these approaches commonly target distinguishing chunked IoT event traffic samples, and they fail at effectively tracking multiple events simultaneously. In this work, we propose WiFinger, a fine-grained multi-IoT event fingerprinting approach against noisy traffic. WiFinger turns the traffic pattern classification task into a subsequence matching problem and introduces novel techniques to account for the high time complexity while maintaining high accuracy. In addition, its reliance on training sample volumes reduces efforts for any future fingerprint updates. Experiments demonstrate that WiFinger outperforms existing approaches under practical threat models, with an average recall of 89% (v.s. 49% and 46% respectively) and almost zero false positives for various IoT events.
The Model Strikes Back
-
Yunzhe Li (Shanghai Jiao Tong University), Jianan Wang (Shanghai Jiao Tong University), Hongzi Zhu (Shanghai Jiao Tong University), James Lin (Shanghai Jiao Tong University), Shan Chang (Donghua University), Minyi Guo (Shanghai Jiao Tong University)
Large Language Models (LLMs) have become foundational components in a wide range of applications, including natural language understanding and generation, embodied intelligence, and scientific discovery. As their computational requirements continue to grow, these models are increasingly deployed as cloud-based services, allowing users to access powerful LLMs via the Internet. However, this deployment model introduces a new class of threat: denial-of-service (DoS) attacks via unbounded reasoning, where adversaries craft specially designed inputs that cause the model to enter excessively long or infinite generation loops. These attacks can exhaust backend compute resources, degrading or denying service to legitimate users. To mitigate such risks, many LLM providers adopt a closed-source, black-box setting to obscure model internals. In this paper, we propose ThinkTrap, a novel input-space optimization framework for DoS attacks against LLM services even in black-box environments. The core idea of ThinkTrap is to first map discrete tokens into a continuous embedding space, then undertake efficient black-box optimization in a low-dimensional subspace exploiting input sparsity. The goal of this optimization is to identify adversarial prompts that induce extended or non-terminating generation across several state-of-the-art LLMs, achieving DoS with minimal token overhead. We evaluate the proposed attack across multiple commercial, closed-source LLM services. Our results demonstrate that, even far under the restrictive request frequency limits commonly enforced by these platforms, typically capped at ten requests per minute (10 RPM), the attack can degrade service throughput to as low as 1% of its original capacity, and in some cases, induce complete service failure.
-
Lichao Wu (Technical University of Darmstadt), Sasha Behrouzi (Technical University of Darmstadt), Mohamadreza Rostami (Technical University of Darmstadt), Maximilian Thang (Technical University of Darmstadt), Stjepan Picek (University of Zagreb & Radboud University), Ahmad-Reza Sadeghi (Technical University of Darmstadt)
Safety alignment is critical for the ethical deployment of large language models (LLMs), guiding them to avoid generating harmful or unethical content. Current alignment techniques, such as supervised fine-tuning and reinforcement learning from human feedback, remain fragile and can be bypassed by carefully crafted adversarial prompts. Unfortunately, such attacks rely on trial and error, lack generalizability across models, and are constrained by scalability and reliability.
This paper presents NeuroStrike, a novel and generalizable attack framework that exploits a fundamental vulnerability introduced by alignment techniques: the reliance on sparse, specialized safety neurons responsible for detecting and suppressing harmful inputs. We apply NeuroStrike to both white-box and black-box settings: In the white-box setting, NeuroStrike identifies safety neurons through feedforward activation analysis and prunes them during inference to disable safety mechanisms. In the black-box setting, we propose the first LLM profiling attack, which leverages safety neuron transferability by training adversarial prompt generators on open-weight surrogate models and then deploying them against black-box and proprietary targets. We evaluate NeuroStrike on over 20 open-weight LLMs from major LLM developers. By removing less than 0.6% of neurons in targeted layers, NeuroStrike achieves an average attack success rate (ASR) of 76.9% using only vanilla malicious prompts. Moreover, Neurostrike generalizes to four multimodal LLMs with 100% ASR on unsafe image inputs. Safety neurons transfer effectively across architectures, raising ASR to 78.5% on 11 fine-tuned models and 77.7% on five distilled models. The black-box LLM profiling attack achieves an average ASR of 63.7% across five black-box models, including Google’s Gemini family.
-
Zhexi Lu (Rensselaer Polytechnic Institute), Hongliang Chi (Rensselaer Polytechnic Institute), Nathalie Baracaldo (IBM Research - Almaden), Swanand Ravindra Kadhe (IBM Research - Almaden), Yuseok Jeon (Korea University), Lei Yu (Rensselaer Polytechnic Institute)
Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs), especially when models are adapted to domain-specific tasks using sensitive data. While prior black-box MIA techniques rely on confidence scores or token likelihoods, these signals are often entangled with a sample’s intrinsic properties—such as content difficulty or rarity—leading to poor generalization and low signal-to-noise ratios. In this paper, we propose ICP-MIA, a novel MIA framework grounded in the theory of training dynamics, particularly the phenomenon of diminishing returns during optimization. We introduce the Optimization Gap as a fundamental signal of membership: at convergence, member samples exhibit minimal remaining loss-reduction potential, while non-members retain significant potential for further optimization. To estimate this gap in a black-box setting, we propose In-Context Probing (ICP)—a training-free method that simulates fine-tuning-like behavior via strategically constructed input contexts. We propose two probing strategies: reference-data-based (using semantically similar public samples) and self-perturbation (via masking or generation). Experiments on three tasks and multiple LLMs show that ICP-MIA significantly outperforms prior black-box MIAs, particularly at low false positive rates. We further analyze how reference data alignment, model type, PEFT configurations, and training schedules affect attack effectiveness. Our findings establish ICP-MIA as a practical and theoretically grounded framework for auditing privacy risks in deployed LLMs.
-
Anna Ablove (University of Michigan), Shreyas Chandrashekaran (University of Michigan), Xiao Qiang (University of California at Berkeley), Roya Ensafi (University of Michigan)
From network-level censorship by the Great Firewall to platform-specific mechanisms implemented by third-party services like TOM-Skype and WeChat, Internet censorship in China has continually evolved in response to new technologies. In the current era of AI, emerging tools like Large Language Models (LLMs) are no exception. Yet, ensuring compliance with China’s strict, legally mandated censorship standards presents a unique and complex challenge for service providers. While current research on content moderation in LLMs is primarily focused on alignment techniques, their lack of reliability prevents sufficient compliance with strictly enforced information controls.
In this work, we present the first study of overt blocking embedded in Chinese LLM services. We leverage information leaks in the communication between the server and client during active chat sessions and aim to extract where blocking decisions are embedded within the LLM services' workflow. We observe a persistent reliance on traditional, dated blocking strategies in prominent services: Baidu-Chat, DeepSeek, Doubao, Kimi, and Qwen. We find blocking placements during the input, output, and search phases, with the latter two leaking varying amounts of censored information to client machines, including near-complete responses and search references not rendered in the browser.
Seeing the need to balance competition on the global stage with homegrown censorship restrictions, we observe in real time the concessions made by service providers hosting models at war with themselves. Through this work, we emphasize the importance of a more holistic threat model of LLM content accessibility, integrating live deployments to study access as it pertains to real world usage, especially in heavily censored regions.
Four Horsemen of the Scale-pocalypse
-
Luke Dramko (Carnegie Mellon University), Claire Le Goues (Carnegie Mellon University), Edward J. Schwartz (Carnegie Mellon University)
Decompilers help reverse engineers analyze software at a higher level of abstraction than assembly code. Unfortunately, because compilation is lossy, traditional decompilers, which are deterministic, produce code that lacks many characteristics that make source code readable in the first place, such as variable and type names. Neural decompilers offer the exciting possibility of statistically filling in these details. Unfortunately, existing work in neural decompilation suffers from substantial limitations that preclude its use on real code, such as the inability to provide definitions for user-defined composite types. In this work, we introduce Idioms, a simple, generalizable, and effective neural decompilation approach that can finetune any LLM into a neural decompiler capable of generating the appropriate user-defined type definitions alongside the decompiled code, and a new dataset, Realtype, that includes substantially more complicated and realistic types than existing neural decompilation benchmarks. We show that our approach yields state-of-the-art results in neural decompilation. On the most challenging existing benchmark---Exebench---our model achieves 54.4% accuracy vs. 46.3% for LLM4Decompile and 37.5% for Nova; on Realtype, our model performs at least 95% better.
-
Efrén López-Morales (New Mexico State University), Ulysse Planta (CISPA Helmholtz Center for Information Security), Gabriele Marra (CISPA Helmholtz Center for Information Security), Carlos Gonzalez-Cortes (Universidad de Santiago de Chile), Jacob Hopkins (Texas A&M University - Corpus Christi), Majid Garoosi (CISPA Helmholtz Center for Information Security), Elías Obreque (Universidad de Chile), Carlos Rubio-Medrano (Texas A&M University - Corpus Christi), Ali Abbasi (CISPA Helmholtz Center for Information Security)
Satellites are the backbone of mission-critical services that enable our modern society to function, for example, GPS. For years, satellites were assumed to be secure because of their indecipherable architectures and the reliance on security by obscurity. However, technological advancements have made these assumptions obsolete, paving the way for potential attacks.
Unfortunately, there is no way to collect data on satellite adversarial techniques, hindering the generation of intelligence that leads to the development of countermeasures.In this paper, we present HoneySat, the first high-interaction satellite honeypot framework, capable of convincingly simulating a real-world CubeSat, a type of Small Satellite (SmallSat). To provide evidence of HoneySat's effectiveness, we surveyed SmallSat operators and deployed HoneySat over the Internet.
Our results show that 90% of satellite operators agreed that HoneySat provides a realistic simulation. Additionally, HoneySat successfully deceived adversaries in the wild and collected 22 real-world adversarial interactions. Finally, we performed a hardware-in-the-loop operation where HoneySat successfully communicated with an in-orbit, operational SmallSat mission.
-
Xin Wang (Tsinghua University), Haochen Wang (Tsinghua University), Haibin Zhang (Yangtze Delta Region Institute of Tsinghua University, Zhejiang), Sisi Duan (Tsinghua University)
Byzantine fault-tolerant (BFT) protocols are known to suffer from the scalability issue. Indeed, their performance degrades drastically as the number of replicas $n$ grows. While a long line of work has attempted to achieve the scalability goal, these works can only scale to roughly a hundred replicas, particularly on low-end machines.
In this paper, we develop BFT protocols from the so-called committee sampling approach that selects a small committee for consensus and conveys the results to all replicas. Such an approach, however, has been focused on the Byzantine agreement (BA) problem (considering replicas only) instead of the BFT problem (in the client-replica model); also, the approach is mainly of theoretical interest only, as concretely, it works for impractically large $n$.
We build an extremely efficient, scalable, and adaptively secure BFT protocol called Pando in partially synchronous environments based on the committee sampling approach. Our evaluation on Amazon EC2 shows that in contrast to existing protocols, Pando can easily scale to a thousand replicas in the WAN environment, achieving a throughput of 62.57 ktx/sec.
-
Ziteng Chen (Southeast University), Menghao Zhang (Beihang University), Jiahao Cao (Tsinghua University & Quan Cheng Laboratory), Xuzheng Chen (Zhejiang University), Qiyang Peng (Beihang University), Shicheng Wang (Unaffiliated), Guanyu Li (Unaffiliated), Mingwei Xu (Quan Cheng Laboratory & Tsinghua University & Southeast University)
RDMA clouds are becoming prevalent, and ACLs are critical to regulate unauthorized network accesses of RDMA applications, services, and tenants. However, the unique QP semantics and high-speed transmission characteristics of RDMA prevent existing ACL expressions and enforcement mechanisms from comprehensively and efficiently governing RDMA traffic in a user-friendly manner. In this paper, we present Janus, a tailored ACL system for RDMA clouds. Janus designs specialized ACL expressions with QP semantics to identify RDMA connections, and provides a high-level policy language for expressing sophisticated ACL intents to govern RDMA traffic. Janus further leverages DPUs with traffic-aware and architecture specific optimizations to enforce ACL policies, enabling line-rate RDMA inspection and robust policy updates. We implement an open-source prototype of Janus with NVIDIA BlueField-3 DPUs. Experiments demonstrate that Janus provides sufficient expressivity for governing unauthorized RDMA accesses, and achieves line-rate throughput in a 200Gbps real-world RDMA testbed with <5µs latency.
Caches to Ashes
-
Ruiyi Zhang (CISPA Helmholtz Center for Information Security), Albert Cheu (Google), Adria Gascon (Google), Daniel Moghimi (Google), Phillipp Schoppmann (Google), Michael Schwarz (CISPA Helmholtz Center for Information Security), Octavian Suciu (Google)
Confidential virtual machines (CVMs) based on trusted execution environments (TEEs) enable new privacy-preserving solutions. Yet, they leave side-channel leakage outside their threat model, shifting the responsibility of mitigating such attacks to developers. However, mitigations are either not generic or too slow for practical use, and developers currently lack a systematic, efficient way to measure and compare leakage across real-world deployments.
In this paper, we present SNPeek, an open-source toolkit that offers configurable side-channel tracing primitives on production AMD SEV-SNP hardware and couples them with statistical and machine-learning-based analysis pipelines for automated leakage estimation. We apply SNPeek to three representative workloads that are deployed on CVMs to enhance user privacy—private information retrieval, private heavy hitters, and Wasm user-defined functions—and uncover previously unnoticed leaks, including a covert channel that exfiltrates data at 497 kbit/s. The results show that SNPeek pinpoints vulnerabilities and guides low-overhead mitigations based on oblivious memory and differential privacy, giving practitioners a practical path to deploy CVMs with meaningful confidentiality guarantees.
-
Zihang Xiang (KAUST), Tianhao Wang (University of Virginia), Cheng-Long Wang (King Abdullah University of Science and Technology), Di Wang (King Abdullah University of Science and Technology)
We investigate the application of differential privacy in hyper-parameter tuning, a process involving selecting the best run from several candidates. Unlike many private learning algorithms, including the prevalent DP-SGD, the privacy implications of selecting the best are often overlooked. While recent works propose a generic textit{private selection} solution for the tuning process, an open question persists: is such privacy upper bound tight?
This paper provides both empirical and theoretical examinations of this question. Initially, we provide studies affirming the current privacy analysis for private selection is indeed tight in general. However, when we specifically study the hyper-parameter tuning problem in a white-box setting, such tightness no longer holds. This is first demonstrated by applying privacy audit on the tuning process. Our findings underscore a substantial gap between the current theoretical privacy bound and the empirical privacy leakage derived even under strong audit setups.
This gap motivates our subsequent theoretical investigations, which provide improved privacy upper bound for private hyper-parameter tuning due to its distinct properties. Our improved bound leads to better utility. Our analysis also demonstrates broader applicability compared to prior analyses, which are limited to specific parameter configurations. Overall, we contribute to a better understanding of how privacy degrades due to textit{selection}.
-
Sudheendra Raghav Neela (Graz University of Technology), Jonas Juffinger (Graz University of Technology), Lukas Maar (Graz University of Technology), Daniel Gruss (Graz University of Technology)
Page cache attacks are hardware-agnostic and can have a high temporal and spatial resolution. With mitigations deployed since 2019, only Evict+Reload-style timing measurements remain, but suffer from a very low temporal resolution and a high impact on system performance due to eviction.
In this paper, we show that the problem of page cache attacks is significantly larger than anticipated. We first present a new systematic approach to page cache attacks based on four primitives: flush, reload, evict, and monitor. From these primitives, we derive five generic attack techniques on the page cache: Flush+Monitor, Flush+Reload, Flush+Flush, Evict+Monitor, and Evict+Reload. We show mechanisms for all primitives that operate on fully up-to-date Linux kernels, bypassing existing mitigations. We demonstrate the practicality of our revived page cache attacks in three scenarios, showing that we advance the state of the art by orders of magnitude in terms of spatial and temporal attack resolution: First, the channel capacity with our fastest attack (Flush+Monitor) achieves an average capacity of 37.7 kB/s in a cross-process covert channel. Second, for low-frequency attacks, we demonstrate inter-keystroke timing and event detection attacks across processes, with a spatial resolution of 4 kB and a temporal resolution of 0.8 μs, improving the state of the art by 6 orders of magnitude. Third, in a website-fingerprinting attack, we achieve an F1 score of 90.54% in a top-100 closed-world scenario. We conclude that further mitigations are necessary against the page cache side channel.
-
Martin Heckel (Hof University of Applied Sciences), Nima Sayadi (Hof University of Applied Sciences), Jonas Juffinger (Unaffiliated), Carina Fiedler (Graz University of Technology), Daniel Gruss (Graz University of Technology), Florian Adamsky (Hof University of Applied Sciences)
Rowhammer is a disturbance error in Dynamic Random-Access Memory (DRAM) that can be deliberately triggered from software by repeatedly reading, i.e., hammering, proximate memory locations in different DRAM rows. While numerous studies evaluated the Rowhammer effect, in particular how it can be triggered and how it can be exploited, most studies only use a small sample size of Dual In-line Memory Modules (DIMMs). Only few studies provided indication for the prevalence of the effect, with clear limitations to specific hardware configurations or FPGA-based experiments with precise control of the DIMM, limiting how far the results can be generalized.
In this paper, we perform the first large-scale study of the Rowhammer effect involving 1006 data sets from 822 systems. We measure Rowhammer prevalence in a fully automated cross-platform framework, FlippyRAM, using the available state-of-the-art software-based DRAM and Rowhammer tools. Our framework automatically gathers information about the DRAM and uses 5 tools to reverse-engineer the DRAM addressing functions, and based on the reverse-engineered functions uses 7 tools to mount Rowhammer. We distributed the framework online and via USB thumb drives to thousands of participants from December 30, 2024, to June 30, 2025. Overall, we collected 1006 datasets from systems with various CPUs, DRAM generations, and vendors. Our study reveals that out of 1006 datasets, 453 (371 of the 822} unique systems) succeeded in the first stage of reverse-engineering the DRAM addressing functions, indicating that successfully and reliably recovering DRAM addressing functions remains a significant open problem. In the second stage, 126 (12.5% of all datasets) exhibited bit flips in our fully automated Rowhammer attacks. Our results show that fully-automated, i.e., weaponizable, Rowhammer attacks work on a lower share of systems than FPGA-based and lab experiments indicated but with 12.5% enough to be a practical vector for threat actors. Furthermore, our results highlight that the two most pressing research challenges around Rowhammer exploitability are more reliable reverse-engineering addressing functions, as 50% of datasets without bit flips failed in the DRAM reverse-engineering stage, and reliable Rowhammer attacks across diverse processor microarchitectures, as only 12.5% of datasets contained bit flips. Addressing each of these challenges could double the number of systems susceptible to Rowhammer and make Rowhammer a more pressing threat in real-world scenarios.
Walls Are Suggestions
-
Tommaso Sacchetti (EURECOM), Daniele Antonioli (EURECOM)
Bluetooth Low Energy (BLE) is a ubiquitous wireless technology used by billions of devices to exchange sensitive data. As defined in the Bluetooth Core Specification v6.1, BLE security relies on two primary protocols: pairing, which establishes a long-term key, and session establishment, which encrypts communications using a fresh session key. While the standard permits paired devices to re-pair to negotiate a new security level, the security implications of this mechanism remain unexplored, despite the associated risks of device impersonation and Machine-in-the-Middle (MitM) attacks.
We analyze BLE re-pairing as defined in the standard v6.1 and identify six design vulnerabilities, including four novel ones, such as unauthenticated re-pairing and security level downgrade. These vulnerabilities are design flaws and affect any standard-compliant BLE device that uses pairing, regardless of its Bluetooth version or security level. We also present four new re-pairing attacks exploiting these vulnerabilities, which we call BLERP. The attacks enable impersonation and MitM of paired devices with minimal or no user interaction (1-click or 0-click). Our attacks are the first to target BLE re-pairing, exploit the interplay between BLE pairing and session establishment, and abuse the SMP security request message.
We develop a novel toolkit that implements our attacks and supports testing of BLE pairing, including end-to-end MitM attacks. Reproducing the toolkit only requires low-cost hardware (nRF52) and open-source software (Mynewt, NimBLE, and Scapy). Our large-scale evaluation demonstrates the attacks’ impact across 22 targets, including 15 BLE Hosts, 12 BLE Controllers, Bluetooth versions up to 5.4, and the most secure configurations (SC, SCO, and authenticated pairing). During our experiments, we also discovered implementation re-pairing flaws affecting the Apple, Android, and NimBLE BLE stacks.
We implement and evaluate two complementary mitigations: a backward-compatible hardening of the re-pairing logic that is immediately deployable by vendors, and an authenticated re-pairing protocol that addresses the attacks by design. We empirically validate the effectiveness of hardened re-pairing and formally model and verify authenticated re-pairing using ProVerif.
-
Gaoning Pan (Hangzhou Dianzi University & Zhejiang Provincial Key Laboratory of Sensitive Data Security and Confidentiality Governance), Yiming Tao (Zhejiang University), Qinying Wang (EPFL and Zhejiang University), Chunming Wu (Zhejiang University), Mingde Hu (Hangzhou Dianzi University & Zhejiang Provincial Key Laboratory of Sensitive Data Security and Confidentiality Governance), Yizhi Ren (Hangzhou Dianzi University & Zhejiang Provincial Key Laboratory of Sensitive Data Security and Confidentiality Governance), Shouling Ji (Zhejiang University)
Hypervisors are under threat by critical memory safety vulnerabilities, with pointer corruption being one of the most prevalent and severe forms. Existing exploitation frameworks depend on identifying highly-constrained structures in the host machine and accurately determining their runtime addresses, which is ineffective in hypervisor environments where such structures are rare and further obfuscated by Address Space Layout Randomization (ASLR). We instead observe that modern virtualization environments exhibit weak memory isolation — guest memory is fully attacker-controlled yet accessible from the host, providing a reliable primitive for exploitation. Based on this observation, we present the first systematic characterization and taxonomy of Cross-Domain Attacks (CDA), a class of exploitation techniques that enable capability escalation through guest memory reuse. To automate this process, we develop a system that identifies cross-domain gadgets, matches them with corrupted pointers, synthesizes triggering inputs, and assembles complete exploit chains. Our evaluation on 15 real-world vulnerabilities across QEMU and VirtualBox shows that CDA is widely applicable and effective.
-
Carina Fiedler (Graz University of Technology), Jonas Juffinger (Graz University of Technology), Sudheendra Raghav Neela (Graz University of Technology), Martin Heckel (Hof University of Applied Sciences), Hannes Weissteiner (Graz University of Technology), Abdullah Giray Yağlıkçı (ETH Zürich), Florian Adamsky (Hof University of Applied Sciences), Daniel Gruss (Graz University of Technology)
Rowhammer bit flips in DRAM enable software attackers to fully compromise a great variety of systems. Hardware mitigations can be precise and efficient, but they suffer from long deployment cycles and very limited or no update capabilities. Consequently, refined attack methods have repeatedly bypassed deployed hardware protections, leaving commodity systems vulnerable to Rowhammer attacks.
In this paper, we present Memory Band-Aid, a principled defense-in-depth against Rowhammer. Memory Band-Aid is no replacement for long-term, efficient hardware mitigations, but instead a defense-in-depth that is activated when hardware mitigations are insufficient for a specific system generation. For this purpose, Memory Band-Aid introduces per-thread and per-bank rate limits for DRAM accesses in the memory controller, ensuring that the minimum number of row activations for Rowhammer bit flips cannot be reached. We implement a proof-of-concept of Memory Band-Aid on Ubuntu Linux and test it on 2 Intel and 2 AMD systems, building on global bandwidth limits due to the lack of per-bank limits in current hardware. Using this PoC, we find that a full implementation including minor hardware changes would have a low overhead of 0 % to 9.4 % on a collection of realistic Phoronix macro-benchmarks. In a micro-benchmark to cause DRAM pressure, we observe a slowdown by a factor of 1 to 5.1. Both overheads only apply to untrusted, throttled workloads, e.g., all userspace programs or only selected sandboxes, such as those in browsers. Especially as Memory Band-Aid can be enabled on demand, we conclude that Memory Band-Aid is an important defense-in-depth that should be deployed in practice as a second defense layer. -
Zachary Ratliff (Harvard University), Ruoxing (David) Yang (Georgetown University), Avery Bai (Georgetown University), Harel Berger (Ariel University), Micah Sherr (Georgetown University), James Mickens (Harvard University)
In authoritarian and highly surveilled environments, traditional communication networks are vulnerable to censorship, monitoring, and disruption. While decentralized anonymity networks such as Tor provide strong privacy guarantees, they remain dependent on centralized Internet infrastructure, making them susceptible to large-scale blocking or shutdowns. To address these limitations, we present textsc{MIRAGE}, a privacy-preserving mobility-based messaging system designed for censorship-resistant communication. textsc{MIRAGE} uses a district-based routing scheme that probabilistically forwards messages based on the high-level mobility patterns of the population. To prevent leakage of individual mobility behavior, textsc{MIRAGE} protects users’ mobility patterns with local differential privacy, ensuring that participation in the network does not reveal an individual’s location history through observable routing decisions.
We implement textsc{MIRAGE} within textit{Cadence}, an open-source simulator that provides a unified framework for evaluating mobility-based protocols using approximated geographical encounters between nodes over time.
We analyze the privacy and efficiency tradeoffs of textsc{MIRAGE} and evaluate its performance against (1)~traditional epidemic and random-walk-based routing protocols and (2) the state-of-the-art privacy-preserving geography-based routing protocol, using real-world trajectories---one from pedestrian movement patterns collected in various urban locations and another consisting of GPS traces from taxi operations. Our results demonstrate that textsc{MIRAGE} significantly reduces message overhead compared to epidemic routing, and outperforms probabilistic flooding in terms of delivery rate, while providing stronger privacy guarantees than existing techniques.
Gone in 60 Milliseconds
-
Varun Gadey (University of Würzburg), Melanie Goetz (University of Würzburg), Christoph Sendner (University of Würzburg), Sampo Sovio (Huawei Technologies), Alexandra Dmitrienko (University of Wuerzburg)
Modern systems increasingly rely on Trusted Execution Environments (TEEs), such as Intel SGX and ARM TrustZone, to securely isolate sensitive code and reduce the Trusted Computing Base (TCB).
However, identifying the precise regions of code especially those involving cryptographic logic that should reside within a TEE remains challenging, as it requires deep manual inspection and is not supported by automated tools yet. To solve this open problem, we propose LLM based Code Annotation Logic (LLM-CAL), a tool that automates the identification of security-sensitive code regions at scale by leveraging most recent and advanced Large Language Models (LLMs). Our approach leverages foundational LLMs (Gemma-2B, CodeGemma-2B, and LLaMA-7B), which we fine-tuned using a newly collected and manually labeled dataset of over 4,000 C source files. We encode local context features, global semantic information, and structural metadata into compact input sequences that guide the model in capturing subtle patterns of security sensitivity in code. The fine-tuning process is based on quantized LoRA—a parameter-efficient technique that introduces lightweight, trainable adapters into the LLM architecture. To support practical deployment, we developed a scalable pipeline for data preprocessing and inference. LLM-CAL achieves an F1 score of 98.40% and a recall of 97.50% in identifying sensitive and non-sensitive code. It represents the first effort to automate the annotation of cryptographic security-sensitive code for TEE-enabled platforms, aiming to minimize the Trusted Computing Base (TCB) and optimize TEE usage to enhance overall system security. -
Chenxu Wang (Southern University of Science and Technology (SUSTech) and The Hong Kong Polytechnic University), Junjie Huang (Southern University of Science and Technology (SUSTech)), Yujun Liang (Southern University of Science and Technology (SUSTech)), Xuanyao Peng (Southern University of Science and Technology (SUSTech) and University of Chinese Academy of Sciences), Yuqun Zhang (Southern University of Science and Technology (SUSTech)), Fengwei Zhang (Southern University of Science and Technology (SUSTech)), Jiannong Cao (Hong Kong Polytechnic University), Hang Lu (University of Chinese Academy of Sciences), Rui Hou (University of Chinese Academy of Sciences), Shoumeng Yan (Ant Group), Tao Wei (Ant Group), Zhengyu He (Ant Group)
Accelerator trusted execution environment (TEE) is a popular technique that provides strong confidentiality, integrity, and isolation protection on sensitive data/code in accelerators. However, most studies are designed for a specific CPU or accelerator and thus lack generalizability. Recent TEE surveys partially summarize the threats and protections of accelerator computing, while they have yet to provide a guide to building an accelerator TEE and compare the pros and cons of their security solutions. In this paper, we provide a holistic analysis of accelerator TEEs over the years. We conclude a typical framework of building an accelerator TEE and summarize the widely-used attack vectors, ranging from software to physical attacks. Furthermore, we provide a systematization of accelerator TEE's three major security mechanisms: (1) access control, (2) memory encryption/decryption, and (3) attestation. For each aspect, we compare varied security solutions in existing studies and conclude their insights. Lastly, we analyze the factors that influence the TEE deployment on real-world platforms, especially on the trusted computing base (TCB) and compatibility issues.
-
Zheng Liu (University of Virginia), Chen Gong (University of Virginia), Terry Yue Zhuo (Monash University and CSIRO's Data61), Kecen Li (University of Virginia), Weichen Yu (Carnegie Mellon University), Matt Fredrikson (Carnegie Mellon University), Tianhao Wang (University of Virginia)
Large language models (LLMs) have presented outstanding performance in code generation and completion. However, fine-tuning these models on private datasets can raise privacy and proprietary concerns, such as the leakage of sensitive personal information. Differentially private (DP) code generation provides theoretical guarantees for protecting sensitive code by generating synthetic datasets that preserve statistical properties while reducing privacy leakage concerns. However, DP code generation faces significant challenges due to the strict syntactic dependencies and the privacy-utility trade-off.
We propose PrivCode, the first DP synthesizer specifically designed for code datasets. It incorporates a two-stage framework to improve both privacy and utility. In the first stage, termed "privacy-sanitizing", PrivCode generates DP-compliant synthetic code by training models using DP-SGD while introducing syntactic information to preserve code structure. The second stage, termed "utility-boosting," fine-tunes a larger pre-trained LLM on the synthetic privacy-free code to mitigate the utility loss caused by DP, enhancing the utility of the generated code. Extensive experiments on four LLMs show that PrivCode generates higher-utility code across various testing tasks under four benchmarks. The experiments also confirm its ability to protect sensitive data under varying privacy budgets. We provide the replication package at the
anonymous link. -
Huaiyu Yan (Southeast University), Zhen Ling (Southeast University), Xuandong Chen (Southeast University), Xinhui Shao (Southeast University, City University of Hong Kong), Yier Jin (University of Science and Technology of China), Haobo Li (Southeast University), Ming Yang (Southeast University), Ping Jiang (Southeast University), Junzhou Luo (Southeast University, Fuyao University of Science and Technology)
Trusted execution environments (TEE) have been widely explored to enhance security for embedded systems. Existing embedded TEE systems run with a small memory footprint and only provide security critical functionalities in order to maintain a minimal trusted computing base (TCB). Unfortunately, such design choice results in the dilemma that these TEE systems are short in software resources, making it difficult to execute complex applications with large code bases inside of embedded TEEs. In this paper, we propose a user-space isolated execution environment (UIEE) so as to augment TEE capabilities by directly running un-modified data processing applications inside of TEEs without increasing the TCB size. UIEE constructs a sandboxed environment by dynamically allocating a sufficient memory region for applications and isolates it from both the rich execution environment (REE) and TEE, defending UIEE from REE attacks while protecting TEE from a potentially compromised UIEE application. Additionally, we propose a library OS (textit{i.e.}, Linux kernel library, LKL) based UIEE runtime environment that can provide standard C runtime APIs to UIEE applications. In order to solve the LKL concurrency issues, we propose an LKL thread synchronization mechanism to run the multi-threaded LKL inside of the UIEE which features a singled thread execution model. Furthermore, we design a novel on-demand thread migration mechanism to realize LKL context switching inside of UIEE. We implement and deploy a UIEE prototype on an NXP IMX6Q SABRE-SD evaluation board, and successful run 8 real-world textit{libc}-based applications inside of UIEE without modification. The experimental results show that UIEE incurs negligible performance overhead. We are the first to propose a TrustZone-oriented LibOS and evaluate its feasibility as well as security features.
The Hashing Dead
-
Nirajan Koirala (University of Notre Dame), Seunghun Paik (Hanyang University), Sam Martin (University of Notre Dame), Helena Berens (University of Notre Dame), Tasha Januszewicz (University of Notre Dame), Jonathan Takeshita (Old Dominion University), Jae Hong Seo (Hanyang University), Taeho Jung (University of Notre Dame)
Private Set Intersection (PSI) protocols allow a querier to determine whether an item exists in a dataset without revealing the query or exposing non-matching records.
It has many applications in fraud detection, compliance monitoring, healthcare analytics, and secure collaboration across distributed data sources.
In these cases, the results obtained through PSI can be sensitive and even require some kind of downstream computation on the associated data before the outcome is revealed to the querier, computation that may involve floating-point arithmetic, such as the inference of a machine learning model.
Although many such protocols have been proposed, and some of them even enable secure queries over distributed encrypted sets, they fail to address the aforementioned real-world complexities.In this work, we present the first textit{encrypted label selection and analytics} protocol construction, which allows the querier to securely retrieve not just the results of intersections among identifiers but also the outcomes of downstream functions on the data/label associated with the intersected identifiers.
To achieve this, we construct a novel protocol based on an approximate CKKS fully homomorphic encryption that supports efficient label retrieval and downstream computations over real-valued data.
In addition, we introduce several techniques to handle identifiers in large domains, e.g., 64 or 128 bits, while ensuring high precision for accurate downstream computations.Finally, we implement and benchmark our protocol, compare it against state-of-the-art methods, and perform evaluation over real-world fraud datasets, demonstrating its scalability and efficiency in large-scale use case scenarios.
Our results show up to 1.4$times$ to 6.8$times$ speedup over prior approaches and select and analyze encrypted labels over real-world datasets in under 65 sec., making our protocol practical for real-world deployments. -
Qingwen Li (Xidian University), Song Bian (Beihang University), Hui Li (Xidian University)
Private Set Union (PSU) allows two parties to compute the union of their private sets without revealing any additional information. While several PSU protocols have been proposed for the unbalanced setting, these constructions still suffer from substantial communication overhead as the size of the larger set increases. Moreover, their reliance on multiple invocations of oblivious pseudo-random functions results in increased communication rounds, which becomes a practical bottleneck.
In this work, we present cwPSU, a novel unbalanced PSU protocol built upon constant-weight codes and leveled fully homomorphic encryption. To prevent leakage, we introduce a new technique called Batched Ciphertext Shuffle, which enables secure reordering of packed ciphertexts. Additionally, we propose an optimized arithmetic constant-weight equality operator, which reduces the number of non-scalar multiplications to just one-third of those required by the naïve approach. The communication complexity of our protocol scales linearly with the size of the smaller set and remains independent of the larger set. Notably, cwPSU requires only a single round of online communication.
Experimental results demonstrate that our cwPSU outperforms the state-of-the-art protocol in various network conditions, achieving a $5.1$--$32.4times$ reduction in communication and a $3.1$--$13.3times$ speedup in runtime.
-
Qi Tan (Shenzhen University), Yi Zhao (Beijing Institute of Technology), Laizhong Cui (Shenzhen University), Qi Li (Tsinghua University), Ming Zhu (Tsinghua University), Xing Fu (Ant Group), Weiqiang Wang (Ant Group), Xiaotong Lin (Ant Group), Ke Xu (Tsinghua University)
Machine learning (ML)-based fraud detection systems are widely employed by enterprises to reduce economic losses from fraudulent activities. However, fraudsters are intelligent and evolve rapidly, employing advanced techniques to falsify the features of transactions to evade the detection system. Worse still, since these falsification processes are not restricted to small intervals, existing robustness enhancement methods based on small-scale perturbations are ineffective. Detecting unrestrictedly perturbed fraudulent activities, which significantly increases uncertainties in fraud detection, is still an open problem.
To resolve this issue, we propose *GAMER*, a robust fraud detection system based on two-player game, achieving both high accuracy and strong robustness in detecting fraudulent activities.
Specifically, *GAMER* leverages feature selection to proactively combat intelligent fraudsters in fraud detection (i.e., selecting fewer features to reduce the combinations of feature falsification), and innovatively formulates the detecting process as a two-player game. By solving the equilibrium of the two-player game, *GAMER* calculates the optimal probability for feature selection, which takes into account all possible falsification strategies of the fraudsters. The equilibrium-based selection probability not only minimizes the profits obtained by fraudsters, demotivating them to launch falsification; but also enables the system to select robust features (i.e., the features that are less likely to be falsified) in detecting fraudulent activities, enhancing the robustness of the system in fraud detection. Our theoretical and experimental results validate the properties of deterrence and robustness enhancement. Moreover, experiments over real-world attacks suffered by the world's leading online payment enterprise demonstrate that *GAMER* outperforms traditional robustness enhancement techniques, which increases the F1 score by 67.5% on average for two-month fraud detection. -
Wenhao Wang (Yale University, IC3), Fangyan Shi (Tsinghua University), Dani Vilardell (Cornell Tech, IC3), Fan Zhang (Yale University, IC3)
Succinct Non-interactive Arguments of Knowledge (SNARKs) can enable efficient verification of computation in many applications. However, generating SNARK proofs for large-scale tasks, such as verifiable machine learning or virtual machines, remains computationally expensive. A promising approach is to distribute the proof generation workload across multiple workers. A practical distributed SNARK protocol should have three properties: horizontal scalability with low overhead (linear computation and logarithmic communication per worker), accountability (efficient detection of malicious workers), and a universal trusted setup independent of circuits and the number of workers. Existing protocols fail to achieve all these properties.
In this paper, we present Cirrus, the first distributed SNARK generation protocol achieving all three desirable properties at once. Our protocol builds on HyperPlonk (EUROCRYPT'23), inheriting its universal trusted setup. It achieves linear computation complexity for both workers and the coordinator, along with low communication overhead. To achieve accountability, we introduce a highly efficient accountability protocol to localize malicious workers. Additionally, we propose a hierarchical aggregation technique to further reduce the coordinator’s workload.
We implemented and evaluated Cirrus on machines with modest hardware. Our experiments show that Cirrus is highly scalable: it generates proofs for circuits with $33$M gates in under $40$ seconds using $32$ $8$-core machines. Compared to the state-of-the-art accountable protocol Hekaton (CCS'24), Cirrus achieves over $7times$ faster proof generation for PLONK-friendly circuits such as the Pedersen hash. Our accountability protocol also efficiently identifies faulty workers within just $4$ seconds, making Cirrus particularly suitable for decentralized and outsourced computation scenarios.
Click Wars: A New Hope
-
Yan Pang (University of Virginia), Wenlong Meng (University of Virginia), Xiaojing Liao (Indiana University Bloomington), Tianhao Wang (University of Virginia)
With the rapid development of large language models, the potential threat of their malicious use, particularly in generating phishing content, is becoming increasingly prevalent. Leveraging the capabilities of LLMs, malicious users can synthesize phishing emails that are free from spelling mistakes and other easily detectable features. Furthermore, such models can generate topic-specific phishing messages, tailoring content to the target domain and increasing the likelihood of success.
Detecting such content remains a significant challenge, as LLM-generated phishing emails often lack clear or distinguishable linguistic features. As a result, most existing semantic-level detection approaches struggle to identify them reliably. While certain LLM-based detection methods have shown promise, they suffer from high computational costs and are constrained by the performance of the underlying language model, making them impractical for large-scale deployment.
In this work, we aim to address this issue. We propose Paladin, which embeds trigger-tag associations into vanilla LLM using various insertion strategies, creating them into instrumented LLMs. When an instrumented LLM generates content related to phishing, it will automatically include detectable tags, enabling easier identification. Based on the design on implicit and explicit triggers and tags, we consider four distinct scenarios in our work. We evaluate our method from three key perspectives: stealthiness, effectiveness, and robustness, and compare it with existing baseline methods. Experimental results show that our method outperforms the baselines, achieving over 90% detection accuracy across all scenarios.
-
Yunyi Zhang (Tsinghua University), Shibo Cui (Tsinghua University), Baojun Liu (Tsinghua University), Jingkai Yu (Tsinghua University), Min Zhang (National University of Defense Technology), Fan Shi (National University of Defense Technology), Han Zheng (TrustAl Pte. Ltd.)
LLM applications (i.e., LLM apps) leverage the powerful capabilities of LLMs to provide users with customized services, revolutionizing traditional application development. While the increasing prevalence of LLM-powered applications provides users with unprecedented convenience, it also brings forth new security challenges. For such an emerging ecosystem, the security community lacks sufficient understanding of the LLM application ecosystem, especially regarding the capability boundaries of the applications themselves.
In this paper, we systematically analyzed the new development paradigm and defined the concept of the LLM app capability space. We also uncovered potential new risks beyond jailbreak that arise from ambiguous capability boundaries in real-world scenarios, namely, capability downgrade and upgrade. To evaluate the impact of these risks, we designed and implemented an LLM app capability evaluation framework, LLMApp-Eval. First, we collected application metadata across 4 platforms and conducted a cross-platform ecosystem analysis. Then, we evaluated the risks for 199 popular applications among 4 platforms and 6 open-source LLMs. We identified that 178 (89.45%) potentially affected applications, which can perform tasks from more than 15 scenarios or be malicious. We even found 17 applications in our study that executed malicious tasks directly, without applying any adversarial rewriting. Furthermore, our experiments also reveal a positive correlation between the quality of prompt design and application robustness. We found that well-designed prompts enhance security, while poorly designed ones can facilitate abuse. We hope our work inspires the community to focus on the real-world risks of LLM applications and foster the development of a more robust LLM application ecosystem.
-
Kim Hammar (University of Melbourne), Tansu Alpcan (University of Melbourne), Emil Lupu (Imperial College London)
Timely and effective incident response is key to managing the growing frequency of cyberattacks. However, identifying the right response actions for complex systems is a major technical challenge. A promising approach to mitigate this challenge is to use the security knowledge embedded in large language models (LLMs) to assist security operators during incident handling. Recent research has demonstrated the potential of this approach, but current methods are mainly based on prompt engineering of frontier LLMs, which is costly and prone to hallucinations. We address these limitations by presenting a novel way to use an LLM for incident response planning with reduced hallucination. Our method includes three steps: fine-tuning, information retrieval, and lookahead planning. We prove that our method generates response plans with a bounded probability of hallucination and that this probability can be made arbitrarily small at the expense of increased planning time under certain assumptions. Moreover, we show that our method is lightweight and can run on commodity hardware. We evaluate our method on logs from incidents reported in the literature. The experimental results show that our method a) achieves up to 22% shorter recovery times than frontier LLMs and b) generalizes to a broad range of incident types and response actions.
Kernel Panic at the Disco
-
Haoran Yang (Institute of Information Engineering, Chinese Academy of Sciences), Jiaming Guo (Institute of Information Engineering, Chinese Academy of Sciences), Shuangning Yang (School of Internet, Anhui University), Guoli Zhao (Institute of Information Engineering, Chinese Academy of Sciences), Qingqi Liu (Institute of Information Engineering, Chinese Academy of Sciences), Chi Zhang (Institute of Information Engineering, Chinese Academy of Sciences), Zhenlu Tan (Institute of Information Engineering, Chinese Academy of Sciences), Lixiao Shan (Institute of Information Engineering, Chinese Academy of Sciences), Qihang Zhou (Institute of Information Engineering, Chinese Academy of Sciences), Mengting Zhou (Institute of Information Engineering, Chinese Academy of Sciences), Jianwei Tai (School of Internet, Anhui University), Xiaoqi Jia (Institute of Information Engineering, Chinese Academy of Sciences)
The proliferation of IoT devices has driven a rise in vulnerability exploits. Existing vulnerability detection approaches heavily rely on firmware or source code for analysis. This reliance critically compromises their efficiency in real-world black-box scenarios. To address this limitation, we propose IoTBec, a novel firmware and source-code independent framework for recurring vulnerability detection. IoTBec innovatively constructs a Vulnerability Interface Signature (VIS) based on black-box interfaces and known vulnerability information. The signature is designed to match potential recurring vulnerabilities against target devices. The framework then deeply integrates this signature-based detection with Large Language Model (LLM)-driven fuzzing. Upon a match, IoTBec automatically leverages LLMs to generate targeted fuzzing payloads for verification.
To evaluate IoTBec, we conducted extensive experiments on devices from five major IoT vendors. Results show that IoTBec discovers over 7 times more vulnerabilities than the current state-of-the-art (SOTA) black-box fuzzing methods, with 100% precision and 93.37% recall. Overall, IoTBec detected 183 vulnerabilities, 169 of which were assigned CVE IDs. Among these, 53 were newly discovered and had an average CVSS 3.x score of 8.61, covering buffer overflows, command injection, and CSRF issues. Notably, through LLM-driven fuzzing, IoTBec also discovered 25 previously unknown vulnerabilities. The experimental evidence suggests that IoTBec’s unique firmware and source-code independent paradigm enhances detection efficiency and enables the discovery of novel and variant vulnerabilities. We will release the source code for IoTBec and the experiment data at https://github.com/IoTBec.
-
Runhao Liu (National University of Defense Technology), Jiarun Dai (Fudan University), Haoyu Xiao (Fudan University), Yuan Zhang (Fudan University), Yeqi Mou (National University of Defense Technology), Lukai Xu (National University of Defense Technology), Bo Yu (National University of Defense Technology), Baosheng Wang (National University of Defense Technology), Min Yang (Fudan University)
Static taint analysis has become a fundamental technique to detect vulnerabilities implied in web services of Linux-based firmware. However, existing works commonly oversimplify the composition of firmware web services. Specifically, only C binaries (i.e., those extracted from the target firmware) are considered within the scope of vulnerability detection. In this work, we observe that modern firmware extensively combines Lua scripts/bytecode and C binaries to implement hybrid web services, and obviously, those C-binary-oriented vulnerability detection techniques can hardly achieve satisfactory performance. In light of this, we propose FirmCross, an automated taint-style vulnerability detector dedicated for C-Lua hybrid web services. Compared to existing detectors, FirmCross can automatically de-obfuscate the Lua bytecode in target firmware, additionally identify distinctive taint sources in Lua codespace, and systematically capture the C-Lua cross-language taint flow. In the evaluation, FirmCross detects 6.82X ~ 14.5X more vulnerabilities than SoTA approaches (i.e., MangoDFA and LuaTaint) in a dataset containing 73 firmware images from 11 vendors. Notably, FirmCross helps identify 610 0-day vulnerabilities among target firmware images. After reporting these vulnerabilities to vendors, till now, 31 vulnerability IDs have been assigned.
-
Shir Bernstein (Ben Gurion University of the Negev), David Beste (CISPA Helmholtz Center for Information Security), Daniel Ayzenshteyn (Ben Gurion University of the Negev), Lea Schönherr (CISPA Helmholtz Center for Information Security), Yisroel Mirsky (Ben Gurion University of the Negev)
Large Language Models (LLMs) are increasingly trusted to perform automated code review and static analysis at scale, supporting tasks such as vulnerability detection, summarization, and refactoring. In this paper, we identify and exploit a critical vulnerability in LLM-based code analysis: an abstraction bias that causes models to overgeneralize familiar programming patterns and overlook small, meaningful bugs. Adversaries can exploit this blind spot to hijack the control flow of the LLM’s interpretation with minimal edits and without affecting actual runtime behavior. We refer to this attack as a Familiar Pattern Attack (FPA).
We develop a fully automated, black-box algorithm that discovers and injects FPAs into target code. Our evaluation shows that FPAs are not only effective against basic and reasoning models, but are also transferable across model families
(OpenAI, Anthropic, Google), and universal across programming languages (Python, C, Rust, Go). Moreover, FPAs remain effective even when models are explicitly warned about the attack via robust system prompts. Finally, we explore positive, defensive uses of FPAs and discuss their broader implications for the reliability and safety of code-oriented LLMs. -
Senapati Diwangkara (Johns Hopkins University), Yinzhi Cao (Johns Hopkins University)
Single Page Application (SPA) frameworks allow developers to build complex web applications in a single HTML page with high-level components (e.g., search box). One research problem for SPAs is how to detect taint-style vulnerabilities, because the SPA framework reintroduces insecure DOM APIs in a new format, such as SPA component parameters as taint sinks. Although previous work has focused on improving vulnerability detection in SPAs, to the best of our knowledge, they rely heavily on hard-coded taint sinks, which not only need to be manually curated for each different SPA framework but may also miss certain insecure SPA APIs, introducing false negatives in detected vulnerabilities.
In this paper, we present TranSPArent, an SPA vulnerability detection tool that automatically abstracts SPA frameworks using a combination of static and dynamic analysis to reveal framework-specific sinks, thus facilitating end-to-end static vulnerability detection. TranSPArent first performs a backward taint analysis from a list of insecure DOM APIs up to the framework interface to reveal which part of the interface could taint the DOM API. This automated framework abstraction is done once per SPA framework. Then, TranSPArent finds dataflow paths between the detected SPA sinks and attacker-controlled sources to detect taint-style vulnerabilities in each application. We evaluated TranSPArent against a database of GitHub repositories and found 11 zero-day vulnerabilities, including a repository with 24k+ GitHub stargazers and 30 million requests/month. So far, four zero-day vulnerabilities has been fixed and/or acknowledged by their developers.
During our evaluation, TranSPArent found a total of 19 intermediate SPA sinks from the three most widely used SPA frameworks, Vue, React, and Angular. 14 of the newly discovered sinks are not listed by the CodeQL standard library, the state-of-the-art static analysis tool.
No Country for Old Prompts
-
Jiongchi Yu (Singapore Management University), Xiaofei Xie (Singapore Management University), Qiang Hu (Tianjin University), Yuhan Ma (Tianjin University), Ziming Zhao (Zhejiang University)
Insider threat, which can lead to unacceptable losses, is a widespread and significant security concern, making its detection essential. Recently, machine learning based insider threat detection (ITD) methods have been proposed with promising results. Despite this success, a major challenge, the lack of sufficient data, limits the further development of these ITD methods. The paradox is that enterprise internal data is highly sensitive and typically inaccessible, while public datasets are either limited in real-world coverage or, in the case of synthetic data, lack rich semantic information and realistic behavioral patterns. As a result, there is a crucial need for the construction of real-world insider threat datasets.
To address this challenge, we propose Chimera, the first large language model (LLM)-based multi-agent framework to automatically simulate both benign and malicious insider activities, as well as collect logs across diverse enterprise environments. Based on analysis of organizational composition and structural characteristics of the organization, Chimera customizes each LLM agent to represent an individual employee by detailed role modeling and couples with modules of group meetings, pairwise interactions, and self-organized scheduling. In this way, Chimera can reflect the complexities of real-world enterprise operations accurately. The current version of Chimera consists of 15 distinct types of manually abstracted insider attacks, such as intellectual property theft and system sabotage. Using Chimera, we simulate the benign and attack activities across three typical data-sensitive organizational scenarios, including technology company, finance corporation, and medical institution, and generate a new dataset named ChimeraLog to facilitate the development of machine learning-based ITD methods.
To evaluate the quality and authenticity of ChimeraLog, we conduct comprehensive human studies and quantitative analyses. The results demonstrate both the diversity and realism of the dataset. Further expert analysis highlights the presence of realistic threat patterns as well as explainable activity traces. In addition, we evaluate the effectiveness of existing insider threat detection methods on ChimeraLog. The average F1-score achieved is 0.83, which is notably lower than the score of 0.99 observed on the baseline dataset CERT, thereby illustrating the greater difficulty posed by ChimeraLog for threat detection tasks.
-
Zichuan Li (University of Illinois Urbana-Champaign), Jian Cui (University of Illinois Urbana-Champaign), Xiaojing Liao (University of Illinois Urbana-Champaign), Luyi Xing (University of Illinois Urbana-Champaign)
Large Language Model (LLM) agents are autonomous systems powered by LLMs, capable of reasoning and planning to solve problems by leveraging a set of tools. However, the integration of multiple tools in LLM agents introduces challenges in securely managing tools, ensuring their compatibility, handling dependency relationships, and protecting control flows within LLM agent's task workflows. In this paper, we present the first systematic security analysis of task control flows in multi-tool-enabled LLM agents. We identify a novel threat, Cross-Tool Harvesting and Polluting (XTHP), which includes multiple attack vectors to first hijack the normal control flows of agent tasks, and then collect and pollute confidential or private information within LLM agent systems. To understand the impact of this threat, we developed Chord, a dynamic scanning tool designed to automatically detect real-world agent tools susceptible to XTHP attacks. Our evaluation of 66 real-world tools from two major LLM agent development frameworks, LangChain and LlamaIndex, revealed that 75% are vulnerable to XTHP attacks, highlighting the prevalence of this threat.
-
Peiyang Li (Tsinghua University & Ant Group), Fukun Mei (Tsinghua University), Ye Wang (Tsinghua University), Zhuotao Liu (Tsinghua University), Ke Xu (Tsinghua University & Zhongguancun Laboratory), Chao Shen (Xi'an Jiaotong University), Qian Wang (Wuhan University), Qi Li (Tsinghua University & Zhongguancun Laboratory)
Web attacks pose a significant threat to Web applications. While deep learning-based systems have emerged as promising solutions for detecting Web attacks, the lack of interpretability hinders their deployment in production. Existing interpretability methods are unable to explain Web attacks because they overlook the structure information of HTTP requests. They merely identify some important features, which are not understandable by security operators and fail to guide them toward effective responses.
In this paper, we propose WebSpotter that achieves interpretable Web attack detection, which enhances existing deep learning-based detection methods by locating malicious payloads of the HTTP requests. It is inspired by the observation that malicious payloads often have a significant impact on the predictions of detection models. WebSpotter identifies the importance of each field of HTTP requests, and then utilizes a machine learning model to learn the correlation between the importance and malicious payloads. In addition, we demonstrate how WebSpotter can assist security operators in mitigating attacks by automatically generating WAF rules. Extensive evaluations on two public datasets and our newly constructed dataset demonstrate that WebSpotter significantly outperforms existing methods, achieving at least a 22% improvement in localization accuracy compared to baselines. We also conduct evaluations on real-world attacks collected from CVEs and real-world Web applications to illustrate the effectiveness of WebSpotter in practical scenarios.
-
Yinan Zhong (Zhejiang University), Qianhao Miao (Zhejiang University), Yanjiao Chen (Zhejiang University), Jiangyi Deng (Zhejiang University), Yushi Cheng (Zhejiang University), Wenyuan Xu (Zhejiang University)
Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.
Prompt Hard with a Vengeance
-
Evan Li (Northeastern University), Tushin Mallick (Northeastern University), Evan Rose (Northeastern University), William Robertson (Northeastern University), Alina Oprea (Northeastern University), Cristina Nita-Rotaru (Northeastern University)
LLM-integrated app systems extend the utility of Large Language Models (LLMs) with third-party apps that are invoked by a system LLM using interleaved planning and execution phases to answer user queries. These systems introduce new attack vectors where malicious apps can cause integrity violation of planning or execution, availability breakdown, or privacy compromise during execution.
In this work, we identify new attacks impacting the integrity of planning, as well as the integrity and availability of execution in LLM-integrated apps, and demonstrate them against IsolateGPT, a recent solution designed to mitigate attacks from malicious apps. We propose Abstract-Concrete-Execute (ACE), a new secure architecture for LLM-integrated app systems that provides security guarantees for system planning and execution. Specifically, ACE decouples planning into two phases by first creating an abstract execution plan using only trusted information, and then mapping the abstract plan to a concrete plan using installed system apps. We verify that the plans generated by our system satisfy user-specified secure information flow constraints via static analysis on the structured plan output. During execution, ACE enforces data and capability barriers between apps, and ensures that the execution is conducted according to the trusted abstract plan. We show experimentally that ACE is secure against attacks from the InjecAgent and Agent Security Bench benchmarks for indirect prompt injection, and our newly introduced attacks. We also evaluate the utility of ACE in realistic environments, using the Tool Usage suite from the LangChain benchmark. Our architecture represents a significant advancement towards hardening LLM-based systems using system security principles.
-
Yizhe Shi (Fudan University), Zhemin Yang (Fudan University), Dingyi Liu (Fudan University), Kangwei Zhong (Fudan University), Jiarun Dai (Fudan University), Min Yang (Fudan University)
In the app-in-app ecosystem, super-apps provide mini-app developers access to various sensitive cloud services, such as cloud database and cloud storage. These services enable mini-app developers to efficiently store and manage mini-app data in the super-app server. To protect these sensitive resources, super-apps implement an identity management mechanism, allowing mini-app developers to verify user identity and ensure that only authorized and trusted users can access specific resources. However, flaws exist in the implementation of resource management by mini-app developers, which can expose sensitive resources to attackers.
In this paper, we conduct the first systematic study of the insecure cloud resource management in the app-in-app ecosystem. We design and implement a tool, ICREMiner, that combines static analysis and dynamic probing to assess the security implications on 22,695 real-world mini-apps that access app-in-app cloud services in four super-app platforms. The results of our study reveal that 2,815 mini-apps (12.40%) are affected by the insecure resource management, involving 8,062 insecure cloud operations. We have identified that some mini-apps of prominent corporations are also vulnerable to these risks. Additionally, we conduct an in-depth analysis of the significant security hazards that can be caused by the vulnerability, such as allowing attackers to steal sensitive user information and pay for free. In response, we have engaged in responsible vulnerability disclosure to the super-app platforms and corresponding mini-app developers. We also provide several mitigation strategies to help them resolve the vulnerabilities.
-
Seonghun Son (Iowa State University), Chandrika Mukherjee (Purdue University), Reham Mohamed Aburas (American University of Sharjah), Berk Gulmezoglu (Iowa State University), Z. Berkay Celik (Purdue University)
Over the past decade, AR/VR devices have drastically changed how we interact with the digital world. Users often share sensitive information, such as their location, browsing history, and even financial data, within third-party apps installed on these devices, assuming a secure environment protected from malicious actors. Recent research has revealed that malicious apps can exploit such capabilities and monitor benign apps to track user activities, leveraging fine-grained profiling tools, such as performance counter APIs. However, app-to-app monitoring is not feasible on all AR/VR devices (e.g., Meta Quest), as a concurrent standalone app execution is disabled. In this paper, we present OVRWatcher, a novel side-channel primitive for AR/VR devices that infers user activities by monitoring low-resolution (1Hz) GPU usage via a background script, unlike prior work that relies on high-resolution profiling. OVRWatcher captures correlations between GPU metrics and 3D object interactions under varying speeds, distances, and rendering scenarios, without requiring concurrent app execution, access to application data, or additional SDK installations. We demonstrate the efficacy of OVRWatcher in fingerprinting both standalone AR/VR and WebXR applications. OVRWatcher also distinguishes virtual objects, such as products in immersive shopping apps selected by real users and the number of participants in virtual meetings, thereby revealing users’ product preferences and potentially exposing confidential information from those meetings. OVRWatcher achieves over 99% accuracy in app fingerprinting and over 98% accuracy in object-level inference.
-
Wayne Wang (University of Michigan), Aaron Ortwein (University of Michigan), Enrique Sobrados (University of New Mexico), Robert Stanley (University of Michigan), Piyush Kumar Sharma (IIT Delhi), Afsah Anwar (University of New Mexico), Roya Ensafi (University of Michigan)
Mobile users increasingly rely on Virtual Private Networks (VPNs) to protect themselves from tracking, surveillance, and censorship. VPN apps operate from a privileged position by requiring interception of user traffic. While this safeguards end user traffic from malicious network intermediaries (e.g., surveilling ISPs), it leads to a critical "transfer of trust" from such network intermediaries to VPN providers. Yet, despite the sensitivity of this role, VPN apps, especially on mobile platforms, remain insufficiently audited.
In this work, we present MVPN-Audit, an extensible framework for systematically analyzing Android VPN apps. Designed to handle the unique challenges of the Android VPN ecosystem, MVPN-Audit enables detailed investigation of VPN applications' behavior across the network layers. We apply our framework to 281 popular VPN apps from the Google Play Store and uncover fundamental and critical issues: 61 apps transmit unencrypted data, with 5 sending sensitive VPN configuration files in cleartext, allowing an attacker to hijack the VPN tunnel connection; 29 apps leak user traffic (including DNS) outside the tunnel; 169 apps fail to obfuscate the traffic to avoid trivial blocking; 76 apps transmit Advertising ID, the device-unique ID widely used for device and user tracking; and 107 apps fail to implement the best security practices in their VPN configuration files. Collectively, these apps have hundreds of millions of installs, highlighting the scale of users being impacted. Our findings reveal a troubling pattern of developer negligence, highlighting how poor enforcement, transparency, and maintenance practices continue to undermine even fundamental security guarantees.
The Fellowship of the MPC
-
Dongyu Meng (UC Santa Barbara), Fabio Gritti (UC Santa Barbara), Robert McLaughlin (UC Santa Barbara), Nicola Ruaro (UC Santa Barbara), Ilya Grishchenko (University of Toronto), Christopher Kruegel (UC Santa Barbara), Giovanni Vigna (UC Santa Barbara)
As decentralized finance (DeFi) continues to innovate the financial system, the security of its building blocks remains a critical concern to its large-scale adoption. In DeFi, the stakes are exceptionally high, marked by recurring instances of financial losses totaling millions of dollars every week. All major blockchain-based financial applications (i.e., DeFi protocols) are built from – and interact with – programs known as smart contracts. While many security tools have been developed to identify specific classes of vulnerabilities (e.g., reentrancy) in individual smart contracts, considerably less effort has been invested in automatically identifying – in real time – attacks against DeFi protocols.
In this paper, we propose a novel approach for real-time, generic, explainable identification of attacks against DeFi protocols. Specifically, we identify potentially risky transactions without relying on any known vulnerability patterns. Our approach, implemented in HOUSTON, first automatically identifies the set of smart contracts that together implement a DeFi application, and then, while monitoring new relevant transactions, builds and updates custom anomaly-detection models. Our models include information about typical execution paths (control flows) as well as information about how the protocol processes data, captured as likely invariants between the contract functions’ arguments and storage variables. HOUSTON offers explainable warnings that can be used for attack triaging.
We evaluated HOUSTON on a large corpus of over 22 million transactions, covering 115 DeFi incidents. In our experiments, HOUSTON achieved a detection true-positive rate of 94.8% while maintaining a low false-positive rate. When compared with state-of-the-art anomaly detection systems, HOUSTON achieves a higher number of true positives and lower false-positive rates. Finally, we deployed HOUSTON in a real-world setting, where it demonstrated real-time monitoring capabilities on commodity hardware while sustaining high accuracy
-
Daiping Liu (Palo Alto Networks, Inc.), Danyu Sun (University of California, Irvine), Zhenhua Chen (Palo Alto Networks, Inc.), Shu Wang (Palo Alto Networks, Inc.), Zhou Li (University of California, Irvine)
Malicious domain detection serves as a critical technique to keep users safe against cyber attacks. Although these systems have demonstrated remarkable detection capabilities, the magnitude of their false positives (FPs) in the real world remains unknown and is often overlooked. To shed light on this essential aspect, we conduct the first measurement study using 6-year FP reports collected from one of the largest global cybersecurity vendors. Our findings reveal that the popularity-based top domain lists that are commonly adopted by current detection systems are insufficient to avoid FPs. In fact, there are still a non-trivial number of FPs in production. We posit that one of the main reasons is that efforts in this area have predominantly focused on detecting malicious indicators, i.e., Indicator of Compromise (IOC), and have made light of the benign ones, i.e., Indicator of Benignity (IOB).
In this paper, we make the first effort focusing on IOB detection. Our work is built upon our key finding that for many FPs in production, their IOBs can be found on the Internet. However, due to the openness of the Internet and unstructured Web content, we face two main challenges to identify these IOBs: understanding what an IOB is and assessing the trustworthiness of an IOB. To address these challenges, we propose a transitive trust model for IOB and implement it in a system called IOBHunter. IOBHunter leverages LLM and chain-of-thought (CoT) which have demonstrated promising capabilities to address several other security threats. Our evaluation using a dataset that contains verified FPs shows that IOBHunter can achieve 99.22% precision and 68.6% recall. IOBHunter is further evaluated in a two-months real-world deployment, in which IOBHunter has identified 4,338 confirmed FPs and 2,051 compromised domains.
-
VDORAM: Towards a Random Access Machine with Both Public Verifiability and Distributed Obliviousness
Huayi Qi (Shandong University), Minghui Xu (Shandong University), Xiaohua Jia (City University of Hong Kong), Xiuzhen Cheng (Shandong University)
Verifiable random access machines (vRAMs) serve as a foundational model for expressing complex computations with provable security guarantees, serving applications in areas such as secure electronic voting, financial auditing, and privacy-preserving smart contracts. However, no existing vRAM provides distributed obliviousness, a critical need in scenarios where multiple provers seek to prevent disclosure against both other provers and the verifiers, because existing solutions struggle with a paradigm mismatch between MPC and ZKP that limits the development of practical multi-prover ZKP front-ends. This gap arises because MPC protocols are optimized for minimal computation, whereas ZKPs require a complete trace for proving. Furthermore, adapting RAM designs is also challenging, as vRAMs are not built for the high costs of oblivious execution and existing DORAMs lack public verifiability.
To address these challenges, we introduce CompatCircuit, the first multi-prover ZKP front-end implementation to our knowledge, designed to bridge this gap. CompatCircuit integrates collaborative zkSNARKs with novel MPC protocols, unifying computation and verification into a single compatible circuit paradigm. Building upon CompatCircuit, we present VDORAM, the first publicly verifiable distributed oblivious RAM. VDORAM reconciles the high communication latency of online MPC with the complexity of offline proof generation, resulting in a RAM design that balances these competing demands. We have implemented CompatCircuit and VDORAM in approximately 15,000 lines of code, demonstrating their practical feasibility through extensive experiments, including micro-benchmarks, comparative analysis, and program examples.
-
xiaoyu fan (IIIS, Tsinghua University), Kun Chen (Ant Group), Jiping Yu (Tsinghua University), Xin Liu (Tsinghua University), Yunyi Chen (Tsinghua University), Wei Xu (Tsinghua Univesity)
In privacy-preserving distributed computation systems like secure multi-party computation (MPC), cross-party communication is the primary bottleneck. Over the past two decades, numerous remarkable protocols have been proposed to reduce the overall communication complexity, substantially narrowing the gap between MPC and plaintext computations. However, these advances often overlook a crucial aspect: the *asymmetric* communication pattern. This imbalance results in significant bandwidth wastage during execution, thereby "locking" the performance.
In this paper, we propose RoundRole, a bandwidth-aware execution optimization for secret-sharing MPC. The key idea is to *decouple* the logical roles, which determine the communication patterns, from the physical nodes, which determine the bandwidth. By partitioning the overall protocol into parallel tasks and strategically mapping each logical role to a physical node for each task, RoundRole effectively allocates the communication workload in accordance with the inherent protocol communication volume and the physical bandwidth. This execution-level optimization fully leverages network resources and "unlocks" the efficiency. We integrate RoundRole on top of ABY3, one of the widely used open-source MPC frameworks. Extensive evaluations across nine protocols under six diverse network settings (with homogeneous and heterogeneous bandwidths) demonstrate significant performance improvements, achieving up to 7.1x speedups.
Where's Waldo's Data
-
Bo Jiang (TikTok Inc.), Wanrong Zhang (TikTok Inc.), Donghang Lu (TikTok Inc.), Jian Du (TikTok Inc.), Qiang Yan (TikTok Inc.)
Local Differential Privacy (LDP) protocols enable the collection of randomized client messages for data analysis, without the necessity of a trusted data curator. Such protocols have been successfully deployed in real-world scenarios by major tech companies like Google, Apple, and Microsoft. In this paper, we propose a Generalized Count Mean Sketch (GCMS) protocol that captures many existing frequency estimation protocols. Our method significantly improves the three-way trade-offs between communication, privacy, and accuracy. We also introduce a general utility analysis framework that enables optimizing parameter designs. {Based on that, we propose an Optimal Count Mean Sketch (OCMS) framework that minimizes the variance for collecting items with targeted frequencies.} Moreover, we present a novel protocol for collecting data within unknown domain, as our frequency estimation protocols only work effectively with known data domain. Leveraging the stability-based histogram technique alongside the Encryption-Shuffling-Analysis (ESA) framework, our approach employs an auxiliary server to construct histograms without accessing original data messages. This protocol achieves accuracy akin to the central DP model while offering local-like privacy guarantees and substantially lowering computational costs.
-
Takao Murakami (The Institute of Statistical Mathematics (ISM) / National Institute of Advanced Industrial Science and Technology (AIST) / RIKEN AIP), Yuichi Sei (University of Electro-Communications), Reo Eriguchi (National Institute of Advanced Industrial Science and Technology (AIST))
Shuffle DP (Differential Privacy) protocols provide high accuracy and privacy by introducing a shuffler who randomly shuffles data in a distributed system. However, most shuffle DP protocols are vulnerable to two attacks: collusion attacks by the data collector and users and data poisoning attacks. A recent study addresses this issue by introducing an augmented shuffle DP protocol, where users do not add noise and the shuffler performs random sampling and dummy data addition. However, it focuses on frequency estimation over categorical data with a small domain and cannot be applied to a large domain due to prohibitively high communication and computational costs.
In this paper, we fill this gap by introducing a novel augmented shuffle DP protocol called the FME (Filtering-with-Multiple-Encryption) protocol. Our FME protocol uses a hash function to filter out unpopular items and then accurately calculates frequencies for popular items. To perform this within one round of interaction between users and the shuffler, our protocol carefully communicates within a system using multiple encryption. We also apply our FME protocol to more advanced KV (Key-Value) statistics estimation with an additional technique to reduce bias. For both categorical and KV data, we prove that our protocol provides computational DP, high robustness to the above two attacks, accuracy, and efficiency. We show the effectiveness of our proposals through comparisons with twelve existing protocols.
-
Quan Yuan (Zhejiang University), Xiaochen Li (University of North Carolina at Greensboro), Linkang Du (Xi'an Jiaotong University), Min Chen (Vrije Universiteit Amsterdam), Mingyang Sun (Peking University), Yunjun Gao (Zhejiang University), Shibo He (Zhejiang University), Jiming Chen (Zhejiang University), Zhikun Zhang (Zhejiang University)
Causal inference plays a crucial role in scientific research across multiple disciplines. Estimating causal effects, particularly the average treatment effect (ATE), from observational data has garnered significant attention. However, computing the ATE from real-world observational data poses substantial privacy risks to users. Differential privacy, which offers strict theoretical guarantees, has emerged as a standard approach for privacy-preserving data analysis. However, existing differentially private ATE estimation works rely on specific assumptions, provide limited privacy protection, or fail to offer comprehensive information protection.
To this end, we introduce PrivATE, a practical ATE estimation framework that ensures differential privacy. In fact, various scenarios require varying levels of privacy protection. For example, only test scores are generally sensitive information in education evaluation, while all types of medical record data are usually private. To accommodate different privacy requirements, we design two levels (i.e., label-level and sample-level) of privacy protection in PrivATE. By deriving an adaptive matching limit, PrivATE effectively balances noise-induced error and matching error, leading to a more accurate estimate of ATE. Our evaluation validates the effectiveness of PrivATE. PrivATE outperforms the baselines on all datasets and privacy budgets.
-
Yu Zheng (University of California, Irvine), Chenang Li (University of California, Irvine), Zhou Li (University of California, Irvine), Qingsong Wang (University of California, San Diego)
Differential privacy (DP) has been integrated into graph neural networks (GNNs) to protect sensitive structural information, e.g., edges, nodes, and associated features across various applications. A prominent approach is to perturb the message-passing process, which forms the core of most GNN architectures. However, existing methods typically incur a privacy cost that grows linearly with the number of layers (e.g., GAP published in Usenix Security’23), ultimately requiring excessive noise to maintain a reasonable privacy level. This limitation becomes particularly problematic when multi-layer GNNs, which have shown better performance than one-layer GNN, are used to process graph data with sensitive information.
In this paper, we theoretically establish that the privacy budget converges with respect to the number of layers by applying privacy amplification techniques to the message-passing process, exploiting the contractive properties inherent to standard GNN operations. Motivated by this analysis, we propose a simple yet effective Contractive Graph Layer (CGL) that ensures the contractiveness required for theoretical guarantees while preserving model utility. Our framework, CARIBOU, supports both training and inference, equipped with a contractive aggregation module, a privacy allocation module, and a privacy auditing module. Experimental evaluations demonstrate that CARIBOU significantly improves the privacy-utility trade-off and achieves superior performance in privacy auditing tasks.
Seal Team Sixteen
-
Osama Bajaber (Virginia Tech), Bo Ji (Virginia Tech), Peng Gao (Virginia Tech)
Tokens play a vital role in enterprise network access control by enabling secure authentication and authorization across various protocols (e.g., JSON Web Tokens, OAuth 2.0). This allows users to access authorized resources using valid access tokens, without the need to repeatedly submit credentials. However, the ambient trust granted to all processes within an authorized host, combined with long token lifetimes, creates an opportunity for malicious processes to hijack tokens and impersonate legitimate users. This threat affects a wide range of protocols and has led to numerous real-world incidents.
In this paper, we present NetCap, a new defense mechanism designed to prevent attackers from using stolen tokens to access unauthorized resources in enterprise environments. The core idea is to introduce unforgeable, process-level capabilities that are bound to authorized processes. These capabilities are continuously embedded in the processes' network traffic to target resources for validation and are frequently refreshed. This binding between process identity and capability ensures that even if access tokens are stolen by malicious processes, they cannot be used to pass authentication without valid capabilities. To support the high volume of requests generated by processes in the network, NetCap introduces a novel data-plane design based on programmable switches and eBPF. Through multiple optimization techniques, our system supports inline generation and embedding of capabilities, allowing large volumes of traffic to be processed at line rate with little overhead. Our extensive evaluations show that NetCap maintains line-rate network performance across a variety of protocols and real-world applications with negligible overhead, while effectively securing these applications against token theft attacks.
-
Hocheol Nam (KAIST), Daehyun Lim (KAIST), Huancheng Zhou (Texas A&M University), Guofei Gu (Texas A&M University), Min Suk Kang (KAIST)
Data-plane programmability in commodity switches is reshaping the landscape of denial-of-service (DoS) defense by enabling adaptive, line-rate mitigation strategies. Recent systems like Cerberus [SP'24] augment limited switch memory with control-plane support to rapidly respond to evolving attacks. In this paper, we reveal a subtle yet critical vulnerability in this model; that is, the very mechanisms that enable the defense system’s agility and scalability can be subverted by a new class of coordinated DoS attacks. We present Heracles, the first attack to exploit hardware-level constraints in programmable switches to orchestrate precise resource contention across data-plane and control-plane memory. By leveraging side-channel timing signals, Heracles triggers synchronized augmentation, memory squeezing, and time-window exploitation, which are three orthogonal contention strategies that significantly degrade or even completely disable the DoS mitigation capabilities. We implement and test Heracles against real Tofino hardware and show that it can reliably disrupt DoS defenses across diverse DoS attack profiles, even when using loosely (1–2 second) time-synchronized attack sources. To mitigate this threat, we propose Shield, a multi-layered DoS mitigation sketch architecture that decouples memory operations across control- and dataplane layers, effectively mitigating the Heracles attack while preserving both line-rate performance and detection accuracy.
-
Ye Wang (University of Kansas), Bo Luo (University of Kansas), Fengjun Li (University of Kansas)
Recent advances in static analysis, fuzzing, and learning-based detection have substantially improved the defense against trigger-based malware; however, these approaches mostly assume that trigger conditions are semantically explicit or distinguishable from normal application logic. In this paper, we present SensorBomb, a novel logic-bomb framework that exploits this assumption through auto-contextualized triggers and onboard sensor-actuator covert channels. Instead of relying on obscure or rare trigger conditions, SensorBomb constructs triggers tightly aligned with the host app’s legitimate sensor usage, actuator behaviors, and functional context so that they appear indistinguishable from benign behavior. To do so, SensorBomb automatically analyzes the host app to select context-compatible sensors, actuators, and sensitive operations, constructs covert trigger channels, and dynamically adapts trigger patterns to evade static analysis, fuzzing, sensor state anomaly detection, and user suspicion. We implement three representative prototypes of such triggers and evaluate them across diverse devices and environments. Our results show that SensorBomb consistently evades state-of-the-art detection techniques and achieves high trigger reliability with zero false positives. Large-scale injection experiments on real-world APKs further demonstrate that SensorBomb can be deployed without affecting normal app functionality. This work reveals a critical and previously underexplored attack surface in mobile malware defenses and calls for more advanced detection mechanisms.
-
Georgios Syros (Northeastern University), Anshuman Suri (Northeastern University), Jacob Ginesin (Northeastern University), Cristina Nita-Rotaru (Northeastern University), Alina Oprea (Northeastern University)
Large Language Model (LLM)-based agents increasingly interact, collaborate, and delegate tasks to one another autonomously with minimal human interaction. Industry guidelines for agentic system governance emphasize the need for users to maintain comprehensive control over their agents, mitigating potential damage from malicious agents. Several proposed agentic system designs address agent identity, authorization, and delegation, but remain purely theoretical, without concrete implementation and evaluation. Most importantly, they do not provide user-controlled agent management.
To address this gap, we propose SAGA, a scalable Security Architecture for Governing Agentic systems, that offers user oversight over their agents’ lifecycle. In our design, users register their agents with a central entity, the Provider, that maintains agents contact information, user-defined access control policies, and helps agents enforce these policies on inter-agent communication. We introduce a cryptographic mechanism for deriving access control tokens, that offers fine grained control over an agent’s interaction with other agents, providing formal security guarantees. We evaluate SAGA on several agentic tasks, using agents in different geolocations, and multiple on-device and cloud LLMs, demonstrating minimal performance overhead with no impact on underlying task utility in a wide range of conditions. Our architecture enables secure and trustworthy deployment of autonomous agents, accelerating the responsible adoption of this technology in sensitive environments.
Spider Man: No Way to Phish
-
Dongchao Zhou (Beijing University of Post and Telecommunications, QI-ANXIN Technology Research Institute), Lingyun Ying (QI-ANXIN Technology Research Institute), Huajun Chai (QI-ANXIN Technology Research Institute), Dongbin Wang (Beijing University of Post and Telecommunications)
JavaScript's widespread adoption has made it an attractive target for malicious attackers who employ sophisticated obfuscation techniques to conceal harmful code. Current deobfuscation tools suffer from critical limitations that severely restrict their practical effectiveness. Existing tools struggle with diverse input formats, address only specific obfuscation types, and produce cryptic output that impedes human analysis.
To address these challenges, we present JSIMPLIFIER, a comprehensive deobfuscation tool using a multi-stage pipeline with preprocessing, abstract syntax tree-based static analysis, dynamic execution tracing, and Large Language Model (LLM)-enhanced identifier renaming. We also introduce multi-dimensional evaluation metrics that integrate control/data flow analysis, code simplification assessment, entropy measures and LLM-based readability assessments.
We construct and release the largest real-world obfuscated JavaScript dataset with 44,421 samples (23,212 wild malicious + 21,209 benign samples). Evaluation shows JSIMPLIFIER outperforms existing tools with 100% processing capability across 20 obfuscation techniques, 100% correctness on evaluation subsets, 88.2% code complexity reduction, and over 4-fold readability improvement validated by multiple LLMs. Our results advance benchmarks for JavaScript deobfuscation research and practical security applications.
-
Sohom Datta (North Carolina State University), Michalis Diamantaris (Technical University of Crete), Ahsan Zafar (North Carolina State University), Junhua Su (North Carolina State University), Anupam Das (North Carolina State University), Jason Polakis (University of Illinois Chicago), Alexandros Kapravelos (North Carolina State University)
WebViews are a prevalent method of embedding web-based content in Android apps. While they offer functionality similar to that of browsers and execute in an isolated context, apps can directly interfere with WebViews by dynamically injecting JavaScript code at runtime. While prior work has extensively analyzed apps' Java code, existing frameworks have limited visibility of the JavaScript code being executed inside WebViews. Consequently, there is limited understanding of the behaviors and characteristics of the scripts executed within WebViews, and whether privacy violations occur.
To address this gap, we propose WebViewTracer, a framework designed to dynamically analyze the execution of JavaScript code within WebViews at runtime. Our system combines within-WebView JavaScript execution traces with Java method-call information to also capture the information exchange occurring between Java SDKs and web scripts. We leverage WebViewTracer to perform the first large-scale, dynamic analysis of privacy-violating behaviors inside WebViews, on a dataset of 10K Android apps. We detect 4,597 apps that load WebViews, and find that over 69% of them inject sensitive and tracking-related information that is typically inaccessible to JavaScript code into WebViews. This includes identifiers like the Advertising ID and Android build ID. Crucially, 90% of those apps use web-based APIs to exfiltrate this information to third-party servers. We also uncover concrete evidence of common web fingerprinting techniques being used by JavaScript code inside WebViews, which can supplement their tracking information. We observe that the dynamic properties of WebViews are being actively leveraged for sensitive information diffusion across multiple actors in the mobile tracking ecosystem, demonstrating the privacy risks posed by Android WebViews. By shedding light on these ongoing privacy violations, our study seeks to prompt additional scrutiny from platform stakeholders on the use of embedded web technologies and highlights the need for additional safeguards.
-
Zheng Zhang (UC RIverside), Haonan Li (UC Riverside), Xingyu Li (UC Riverside), Hang Zhang (Indiana University), Zhiyun Qian (University of California, Riverside)
Bug bisection has been an important security task that aims to understand the ranges of software versions impacted by the bug, i.e., identifying the commit that introduced the bug. However, traditional patch-based bisection methods are faced with several significant barriers: For example, they assume that the bug-inducing commit (BIC) and the patch commit modify the same functions, which is not always true; they often rely purely on code changes, while the commit message frequently contains a wealth of vulnerability-related information; they are also based on simple heuristics (e.g., assuming the BIC initializes lines deleted in the patch) and lack a logical analysis of the vulnerability.
In this paper, we make the observation that Large Language Models (LLMs) are well positioned to break the barriers of existing solutions, e.g., comprehend both textual data and code well in patches and commits. We develop a comprehensive multi-stage pipeline leveraging LLMs to (1) take advantage of full patch information, (2) have LLM assess logic of the bug and the likelihood of a commit being the one that introduced the bug, and (3) gradually narrow down the candidate with multiple down-select processes. In our evaluation, we demonstrate that our approach achieves significantly better accuracy than the state-of-the-art solution by more than 38%. Our results further confirm that the comprehensive multi-stage pipeline is essential, as it improves accuracy by 60% over naive LLM application.
-
Jiawen Shi (Huazhong University of Science and Technology), Zenghui Yuan (Huazhong University of Science and Technology), Guiyao Tie (Huazhong University of Science and Technology), Pan Zhou (Huazhong University of Science and Technology), Neil Gong (Duke University), Lichao Sun (Lehigh University)
Tool selection is a key component of LLM agents. A popular approach follows a two-step process - emph{retrieval} and emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce textit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, DataSentinel, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.
The Good, the Bad, and the Adversarial
-
Zhaoxi Zhang (University of Technology Sydney), Xiaomei Zhang (Griffith University), Yanjun Zhang (University of Technology Sydney), He Zhang (RMIT University), Shirui Pan (Griffith University), Bo Liu (University of Technology Sydney), Asif Qumer Gill (University of Technology Sydney Australia), Leo Zhang (Griffith University)
Large Language Model (LLM) watermark has emerged as a promising technique for copyright protection, misuse prevention, and machine-generated content detection. It injects detectable signals during the LLM generation process, allowing for later identification by a corresponding detector. To assess the robustness of watermark schemes, existing studies typically adopt watermark removal attacks, which aim to erase embedded signals by modifying the watermarked text. However, we reveal that existing watermark removal attacks are suboptimal, which leads to the misconception that effective watermark removal requires either a large perturbation budget or a strong adversary’s capabilities, such as unlimited queries to the victim LLM or its watermark detector. A systematic scrutinization of removal attack capabilities as well as the development of more sophisticated techniques remains largely underexplored. As a result, the robustness of existing watermarking schemes may be overestimated.
To bridge the gap, we first formalize the system model for LLM watermark, and characterize two realistic threat models constrained on limited access to the watermark detector. We then analyze how different types of perturbation vary in their attack range, i.e., the number of tokens they can affect with a single edit. We observe that character-level perturbations (e.g., typos, swaps, deletions, homoglyphs) can influence multiple tokens simultaneously by disrupting the tokenization process. We demonstrate that character-level perturbations are significantly more effective for watermark removal compared to token-level or sentence-level approaches under the most restrictive threat model. We further propose guided removal attacks based on the Genetic Algorithm (GA) that uses a reference detector for optimization. Under a practical threat model with limited black-box queries to the watermark detector, our method demonstrates strong removal performance. Experiments across five representative watermarking schemes and two widely-used LLMs consistently confirm the superiority of character-level perturbations and the effectiveness of the reference-detector-guided GA in removing watermarks under realistic constraints. Additionally, we argue there is an adversarial dilemma when considering potential defenses: any fixed defense can be bypassed by a suitable perturbation strategy. Motivated by this principle, we propose an adaptive compound character-level attack. Experimental results show that this approach can effectively defeat the defenses. Our findings highlight significant vulnerabilities in existing LLM watermark schemes and underline the urgency for the development of new robust mechanisms.
-
Hao Luan (Fudan University), Xue Tan (Fudan University), Zhiheng Li (Shandong University), Jun Dai (Worcester Polytechnic Institute), Xiaoyan Sun (Worcester Polytechnic Institute), Ping Chen (Fudan University)
To safeguard the intellectual property of high-value deep neural networks, black-box watermarking has emerged as a critical defense and has gained increasing momentum. These methods embed watermarks into the model’s prediction behavior through strategically crafted trigger samples, enabling verification via API queries. Meanwhile, model extraction attacks threaten proprietary deep learning models by exploiting query access to replicate watermarked models. These attacks also offer insights into the resilience of watermarking schemes and adversarial capabilities. However, previous methods struggle to remove watermark information, inadvertently retaining defensive mechanisms. They also suffer from inefficiency, often requiring thousands of queries to achieve competitive performance.
To address these limitations, we propose a query-efficient model extraction framework named SSLExtraction. SSLExtrac- tion selects queries via a greedy random walk in the feature space, leading to both effective model replication and watermark removal. Specifically, SSLExtraction follows the self-supervised learning paradigm to extract intrinsic data representations, transforming the original pixel-level inputs into watermark- agnostic features. Then, we propose a greedy random walk algorithm in the feature space to construct a well-dispersed query set that effectively covers the feature space while avoiding redundant queries. By selecting queries in the feature space, our method naturally identifies watermark patterns as outliers, enabling simultaneous watermark removal. Additionally, we propose an evaluation metric tailored for the watermarking task that emphasizes the distinction between benign and stolen models. Unlike previous approaches that rely on manually predefined thresholds, our evaluation metric employs hypothesis testing to measure the relative distance from a suspicious model to both a watermarked model and a benign model, identifying which the suspicious model most closely resembles. Experimental results demonstrate that our method significantly reduces query costs compared to baselines while effectively removing watermarks across various datasets and watermarking scenarios.
-
Shang Wang (University of Technology Sydney), Tianqing Zhu (City University of Macau), Dayong Ye (City University of Macau), Hua Ma (Data61, CSIRO), Bo Liu (University of Technology Sydney), Ming Ding (Data61, CSIRO), Shengfang Zhai (National University of Singapore), Yansong Gao (School of Cyber Science and Engineering, Southeast University)
In modern Data-as-a-Service (DaaS) ecosystems, data curators such as data brokerage companies aggregate high-quality data from many contributors and monetize it for deep learning model providers. However, malicious curators can sell valuable data but not inform their original contributors, which violates individual benefits and the law. Intrusive watermarking is one of the state-of-the-art (SOTA) techniques for protecting data copyright, and it detects whether a suspicious model carries the predefined pattern. However, these approaches face numerous limitations: struggle to work under low watermark injection rates ($le1.0%$); performance degradation; false positives; not robust against watermarking cleansing.
This work proposes an innovative intrusive watermarking approach, dubbed *DIP* (underline{D}ata underline{I}ntelligence underline{P}robabilistic Watermarking), to support dataset ownership verification while addressing the limitations above. It applies a distribution-aware sample selection algorithm, embeds probabilistic associations between watermarked samples and multiple outputs, and adopts a two-fold verification framework that leverages both inference results and their distribution as watermark signals. Extensive experiments on 4 image and 5 text datasets demonstrate that *DIP* maintains the model's performance, and achieves an average watermark success rate of 89.4% at a 1% injection budget. We further validate that *DIP* is orthogonal to various watermarked data designs and can seamlessly integrate their strengths. Moreover, *DIP* proves effective across diverse modalities (image and text) and tasks (regression), with strong performance on generation tasks in large language models. *DIP* exhibits robustness against various adversarial environments, including 3 based on data augmentation, 3 on data cleansing, 4 on robust training and 3 on collusion-based watermark removal, while existing SOTAs fail. The source code is released at https://github.com/SixLab6/DIP.
-
Yiluo Wei (The Hong Kong University of Science and Technology (Guangzhou)), Peixian Zhang (The Hong Kong University of Science and Technology (Guangzhou)), Gareth Tyson (The Hong Kong University of Science and Technology (Guangzhou))
AI character platforms, which allow users to engage in conversations with AI personas, are a rapidly growing application domain. However, their immersive and personalized nature, combined with technical vulnerabilities, raises significant safety concerns. Despite their popularity, a systematic evaluation of their safety has been notably absent. To address this gap, we conduct the first large-scale safety study of AI character platforms, evaluating 16 popular platforms using a benchmark set of 5,000 questions across 16 safety categories. Our findings reveal a critical safety deficit: AI character platforms exhibit an average unsafe response rate of 65.1%, substantially higher than the 17.7% average rate of the baselines. We further discover that safety performance varies significantly across different characters and is strongly correlated with character features such as demographics and personality. Leveraging these insights, we demonstrate that our machine learning model is able identify less safe characters with an F1-score of 0.81. This predictive capability can be beneficial for platforms, enabling improved mechanisms for safer interactions, character search/recommendations, and character creation. Overall, the results and findings offer valuable insights for enhancing platform governance and content moderation for safer AI character platforms.
Wednesday, 25 February
Plenary recognition of award winners
Process and the Furious
-
Zhechang Zhang (The Pennsylvania State University), Hengkai Ye (The Pennsylvania State University), Song Liu (University of Delaware), Hong Hu (The Pennsylvania State University)
Control-flow integrity (CFI) is a widely adopted defense against control-flow hijacking attacks, designed to restrict indirect control transfers to a set of legitimate targets. However, even under a precise static CFI policy, attackers can still hijack control flow through function substitution attacks (Sub attacks), by replacing one valid target with another that remains within the allowed set. While prior work has demonstrated the feasibility of such attacks through manual construction, no approach constructs them systematically, scalably, and in an end-to-end manner.
In this work, we present SACK, the first systematic framework for automatically constructing Sub attacks at scale. SACK collects triggered indirect call targets from benign executions and synthesizes security oracles with the assistance of a large language model. It then automatically performs target substitutions and leverages security oracles to detect security violations, while ensuring that execution strictly adheres to precise CFI policies. We apply SACK to seven widely used applications and successfully construct 419 Sub attacks that compromise critical security features. We further develop five end-to-end exploits based on historical bugs in SQLite3, V8 and Nginx, enabling arbitrary command execution or authentication bypass. Our results demonstrate that SACK provides a scalable and automated pipeline capable of uncovering large numbers of end-to-end attacks across diverse applications.
-
Yoochan Lee (Max Planck Institute for Security and Privacy), Hyuk Kwon (Theori, Inc.), Thorsten Holz (Max Planck Institute for Security and Privacy)
With the advent of Kernel Control-Flow Integrity (KCFI), Data-Oriented Programming (DOP) has emerged as an essential alternative to traditional control-flow hijacking techniques such as Return-Oriented Programming (ROP). Unlike control-flow attacks, DOP manipulates kernel data-flow to achieve privilege escalation without violating control-flow integrity. However, traditional DOP attacks remain complex and exhibit limited practicality due to their multistage nature, typically requiring heap address leakage, arbitrary address read, and arbitrary address write capabilities. Each stage imposes strict constraints on the selection and usage of kernel objects.
To address these limitations, we introduce DirtyFree, a systematic exploitation method that leverages the arbitrary free primitive. This primitive enables the forced deallocation of attacker-controlled kernel objects, significantly reducing exploitability requirements and simplifying the overall exploitation process. DirtyFree provides a systematic method for identifying suitable arbitrary free objects across diverse kernel caches and presents a structured exploitation strategy targeting security-critical objects such as cred. Through extensive evaluation, we successfully identified 14 arbitrary free objects covering most kernel caches, demonstrating DirtyFree's practical effectiveness by successfully exploiting 24 real-world kernel vulnerabilities. Additionally, we propose and implement two mitigation techniques designed to mitigate DirtyFree, effectively preventing exploitation while incurring negligible performance overhead (i.e., 0.28% and -0.55%, respectively).
-
Zhen Huang (Shanghai Jiao Tong University), Yidi Kao (Auburn University), Sanchuan Chen (Auburn University), Guoxing Chen (Shanghai Jiao Tong University), Yan Meng (Shanghai Jiao Tong University), Haojin Zhu (Shanghai Jiao Tong University)
Trusted Execution Environment (TEE) has been adopted to secure computation outsourced to untrusted clouds, and the associated remote attestation mechanism enables the user to verify the integrity of the outsourced computation at launch time. However, memory corruption attacks break TEE’s security guarantees without being detected after launch-time attestation. While control-flow attestation (CFA) schemes aim to detect runtime compromises, most existing CFA schemes lack concrete verification methods and can be bypassed by data-only attacks. In this paper, we propose the concept of External-Input Attestation to attest all writes to TEE-protected applications, based on the observation that memory corruption attacks typically start with unintended writes. This approach ensures a trusted enclave state by verifying all writes match expectations, transforming security issues, such as control-flow hijacking, into reliability issues, such as a software crash due to unexpected input. For efficient reference measurement derivation and verification, the current version of External-Input Attestation is limited to enclaved applications whose inputs are known to the verifier. This design is validated by implementing and evaluating prototypes on AMD SEV-SNP and Penglai, where security and performance evaluations show a minimal performance overhead in case studies, including secure model training, model inference, database workloads, and key management.
-
Bocheng Xiang (Fudan University), Yuan Zhang (Fudan University), Hao Huang (Fudan university), Fengyu Liu (Fudan University), Youkun Shi (Fudan University)
Link Following (LF) attacks in the Windows file system allow adversaries to stealthily redirect benign file operations to protected files by abusing crafted combinations of symbolic links (link chains), thereby enabling arbitrary manipulation of protected files. Such attacks typically manifest as either single-step attacks or multi-step attacks, depending on the sequencing of the constructed link chain. Existing countermeasures against LF attacks either rely on heavyweight modeling or suffer from poor compatibility and limited applicability, and none provide comprehensive protection across different types of LF attacks.
In this paper, we present LinkGuard, a lightweight state-aware runtime guard against LF attacks targeting Windows systems. The novelty of LinkGuard lies in its two-stage design: The first stage aims to improve defense efficiency by performing dynamic subject filtering, which monitors only file operations and associated subjects involved in the creation and following of link chains; The second stage applies FSM-based rule matching to precisely defend LF attacks, ensuring effective and accurate defense. We evaluate LinkGuard's prototype across five representative Windows systems to validate its compatibility. On a dataset of 70 real-world vulnerabilities, LinkGuard successfully mitigates all single-step attacks and 95.45% of multi-step attacks, with zero false positives on benign operations. On average, LinkGuard only incurs 1% overhead in microbenchmarks and 3.4% overhead in real-world application workloads, while adding a negligible 5 ms latency on benign file operations.
Fuzz Lightyear to Infinity
-
Xiangpu Song (Shandong University), Longjia Pei (Shandong University), Jianliang Wu (Simon Fraser University), Yingpei Zeng (Hangzhou Dianzi University), Gaoshuo He (Shandong University), Chaoshun Zuo (Independent Researcher), Xiaofeng Liu (Shandong University), Qingchuan Zhao (City University of Hong Kong), Shanqing Guo (Shandong University)
Network protocol implementations are expected to strictly comply with their specifications to ensure reliable and secure communications. However, the inherent ambiguity of natural-language specifications often leads to developers' misinterpretations, causing protocol implementations to deviate from standard behaviors. These deviations result in subtle non-compliance bugs that can cause interoperability issues and critical security vulnerabilities. Unlike memory corruption bugs, these bugs typically do not exhibit explicit error behaviors, resulting in existing bug oracles being insufficient to thoroughly detect them. Moreover, existing works require heavy manual effort to verify findings and analyze root causes, severely limiting their scalability in practice.
In this paper, we present ProtocolGuard, a novel framework that systematically detects non-compliance bugs by combining LLM-guided static analysis with fuzzing-based dynamic verification. ProtocolGuard first extracts normative rules from protocol specifications using a hybrid method, and performs LLM-guided program slicing to extract code slices relevant to each rule. It then leverages LLMs to detect semantic inconsistencies between these rules and code logic, and dynamically verify whether these bugs can be triggered. To facilitate bug verification, ProtocolGuard first uses LLMs to automatically generate assertion statements and instrument the code to turn silent inconsistencies into observable assertion failures. Then, it produces initial test cases that are more likely to trigger the bug with the help of LLMs for dynamic verification. Lastly, ProtocolGuard dynamically tests the instrumented code to confirm bug identification and generate proof-of-concept test cases. We implemented a prototype of ProtocolGuard and evaluated it on 11 widely-used protocol implementations.
ProtocolGuard successfully discovered 158 non-compliance bugs with high accuracy, 70 of which have been confirmed, and the majority of which can be converted into assertions and dynamically verified. The comparison with existing state-of-the-art tools demonstrates that ProtocolGuard outperforms them in both precision and recall rates in bug detection capabilities. -
Min Shi (Wuhan University), Yongkang Xiao (Wuhan University), Jing Chen (Wuhan University), Kun He (Wuhan University), Ruiying Du (Wuhan University), Meng Jia (Department of Computing, the Hong Kong Polytechnic University)
The Secure Connection (SC) pairing is the latest version of the security protocol designed to protect sensitive information transmitted over Bluetooth Low Energy (BLE) channels. A formal and rigorous analysis of this protocol is essential for improving security assurances and identifying potential vulnerabilities. However, the complexity of the protocol flow, difficulties in formalizing pairing method selection, and overly idealized user assumptions present significant obstacles to such analysis. In this paper, we address these challenges and present an accurate and comprehensive formal analysis of the BLE-SC pairing protocol using Tamarin. We extract state machines for each participant as the blueprint for modeling the protocol, and we use an equational theory to formalize the pairing method selection logic. Our model incorporates subtle user behaviors and considers stronger adversary capabilities, including the potential compromise of private channels such as the temporary out-of-band channel. We develop a verification strategy to automate protocol analysis and implement a script to parallelize verification tasks across multiple servers. We verify 84 pairing cases and identify the minimal security assumptions required for the protocol. Moreover, our results reveal a new Man-in-the-Middle (MitM) attack, which we call the PE confusion attack. We provide tools and Proof-of-Concept (PoC) exploits for simulating and understanding this attack within a controlled environment. Finally, we propose countermeasures to defend against this attack, improving the security of the BLE-SC pairing protocol.
-
Jiaxing Cheng (Institute of Information Engineering, CAS; School of Cyber Security, UCAS), Ming Zhou (School of Cyber Science and Engineering, Nanjing University of Science and Technology), Haining Wang (Virginia Tech), Xin Chen (Institute of Institute of Information Engineering, CAS; School of Cyber Security, UCAS), Yuncheng Wang (Institute of Institute of Information Engineering, CAS; School of Cyber Security, UCAS), Yibo Qu (Institute of Institute of Information Engineering, CAS; School of Cyber Security, UCAS), Limin Sun (Institute of Institute of Information Engineering, CAS; School of Cyber Security, UCAS)
Programmable Logic Controllers (PLCs) automate industrial operations using vendor-supplied logic instruction libraries compiled into device firmware. These libraries may contain security flaws that, when exploited through physical control routines, network-facing services, or PLC runtime subsystems, may lead to privilege violations, memory corruption, or data leakage. This paper presents LogicFuzz, the first fuzzing framework designed specifically to target logic instructions in PLC firmware. LogicFuzz constructs a semantic dependency graph (SDG) that captures both operational semantics and inter-instruction dependencies in PLC code. Leveraging the SDG together with an enable-signal mechanism, LogicFuzz automatically synthesizes instruction-tailored seed programs, significantly reducing manual effort and enabling controlled, resettable fuzzing on real PLC hardware. To uncover bugs conditioned on control-flow triggers (i.e., invocation patterns), LogicFuzz mutates the SDG to diversify instruction-invocation contexts. To expose data-triggered faults, it performs coverage-guided parameter mutation under valid semantic constraints. In addition, LogicFuzz integrates a multi-source oracle that monitors runtime logs, status LEDs, and communication states to detect instruction-level failures during fuzzing. We evaluate LogicFuzz on six production PLCs from three major vendors and uncover 19 instruction-level bugs, including four previously unknown vulnerabilities.
-
Xingyu Li (UC Riverside), Juefei Pu (UC Riverside), Yifan Wu (UC Riverside), Xiaochen Zou (UC Riverside), Shitong Zhu (UC Riverside), Qiushi Wu (IBM), Zheng Zhang (UC Riverside), Joshua Hsu (UC Riverside), Yue Dong (UC Riverside), Zhiyun Qian (UC Riverside), Kangjie Lu (University of Minnesota), Trent Jaeger (UC Riverside), Michael De Lucia (U.S. Army Research Laboratory), Srikanth V. Krishnamurthy (UC Riverside)
Open-source software projects are foundational to modern software ecosystems, with the Linux kernel standing out as a critical exemplar due to its ubiquity and complexity. Although security patches are continuously integrated into the Linux mainline kernel, downstream maintainers often delay their adoption, creating windows of vulnerability. A key reason for this lag is the difficulty in identifying security-critical patches, particularly those addressing exploitable vulnerabilities such as out-of-bounds (OOB) accesses and use-after-free (UAF) bugs. This challenge is exacerbated by intentionally silent bug fixes, incomplete or missing CVE assignments, delays in CVE issuance, and recent changes to the CVE assignment criteria for the Linux kernel.
Prior efforts such as GraphSPD, have proposed binary classifiers to distinguish security versus non-security patches. However, these approaches do not provide fine-grained categorization of vulnerability types, which is essential for prioritizing fixes for high-impact bugs like OOB and UAF. Our work aims to take such coarsely labeled security patches and classify them into fine-grained categories, i.e., OOB, UAF, or non-OOB-UAF types.
While fine-grained patch classification approaches exist, they exhibit limitations in both coverage and accuracy. In this work, we identify previously unexplored opportunities to significantly improve fine-grained patch classification. Specifically, by leveraging cues from commit titles/messages and diffs alongside appropriate code context, we develop DUALLM, a dual-method pipeline that integrates two approaches based on a Large Language Model (LLM) and a fine-tuned small language model. DUALLM achieves 87.4% accuracy and an F1-score of 0.875, significantly outperforming prior solutions. Notably, DUALLM successfully identified 111 of 5,140 recent Linux kernel patches as addressing OOB or UAF vulnerabilities, with 90 true positives confirmed by manual verification (many do not have clear indications in patch descriptions). Moreover, we constructed proof-of-concepts for two identified bugs (one UAF and one OOB), including one developed to conduct a previously unknown control-flow hijack as further evidence of the correctness of the classification.
CSI: JPEG Unit
-
Ahmad ALBarqawi (New Jersey Institute of Technology), Mahmoud Nazzal (Old Dominion University), Issa Khalil (Qatar Computing Research Institute (QCRI), HBKU), Abdallah Khreishah (New Jersey Institute of Technology), NhatHai Phan (New Jersey Institute of Technology)
The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Deepfakes manipulate videos, images, and audio, spread misinformation, blur the line between real and fake, and highlight the need for effective detection approaches. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feature extraction across spatial and frequency domains, ViGText captures details that enhance its robustness and accuracy to detect sophisticated deepfakes. Extensive experiments demonstrate that ViGText significantly enhances generalization and achieves a notable performance boost when it detects user-customized deepfakes. Specifically, average F1 scores rise from 72.45% to 98.32% under generalization evaluation, and reflects the model’s superior ability to generalize to unseen, fine-tuned variations of stable diffusion models. As for robustness, ViGText achieves an increase of 11.1% in recall compared to other deepfake detection approaches against state-of-the-art foundation model-based adversarial attacks. ViGText limits classification performance degradation to less than 4% when it faces targeted attacks that exploit its graph-based architecture and marginally increases the execution cost. ViGText combines granular visual analysis with textual interpretation, establishes a new benchmark for deepfake detection, and provides a more reliable framework to preserve media authenticity and information integrity.
-
Kavita Kumari (Technical University of Darmstadt), Sasha Behrouzi (Technical University of Darmstadt), Alessandro Pegoraro (Technical University of Darmstadt), Ahmad-Reza Sadeghi (Technical University of Darmstadt)
The rapid advancement of generative models such as GANs and diffusion-based architectures has led to the widespread creation of hyperrealistic synthetic images. Although these technologies drive innovation in the media and data generation, they also raise significant ethical, social, and security concerns. In response, numerous detection methods have been developed, including frequency domain analysis and deep learning classifiers. However, these approaches often struggle to generalize across unseen generative models and typically lack physical grounding, leaving them vulnerable to adaptive attacks and limited in interpretability.
We propose Light2Lie, a physics-augmented deepfake detection framework that leverages principles of specular reflection, specifically the Fresnel reflectance model, to reveal inconsistencies in light–surface interactions that generative models struggle to reproduce effectively. Our method first employs a neural network to estimate the surface base reflectance and then derives a microfacet-inspired specular response map that encodes subtle geometric and optical discrepancies between real and synthetic images. This signal is integrated into a secondary classifier, as feature maps, that learns to distinguish the two classes based on reflectance-driven patterns. To further enhance robustness, we introduce a feedback refinement mechanism that updates the base reflectance model output using classification errors, tightly coupling physical modeling with the learning objective. Extensive experiments on multiple deepfake datasets demonstrate that our approach obtains better generalization performance to unseen generative model samples by getting up to 74% precision on diverse deepfake domains, outperforming state-of-the-art baselines while providing robust, physics-grounded decisions.
-
Zihao Liu (Iowa State University), Aobo Chen (Iowa State University), Yan Zhang (Iowa State University), Wensheng Zhang (Iowa State University), Chenglin Miao (Iowa State University)
Speech synthesis technologies, driven by advances in deep learning, have achieved remarkable realism, enabling diverse applications across various domains. However, these technologies can also be exploited to generate fake speech, introducing significant risks. While existing fake speech detection methods have shown effectiveness in controlled settings, they often struggle to generalize to unseen scenarios, including new synthesis models, languages, and recording conditions. Moreover, many existing approaches rely on specific assumptions and lack comprehensive insights into the common artifacts inherent in fake speech. In this paper, we rethink the task of fake speech detection by proposing a new perspective focused on analyzing the spectrogram magnitude. Through extensive analysis, we uncover that synthetic speech consistently exhibits artifacts in the magnitude representation of the spectrogram, such as reduced texture detail and inconsistencies across magnitude ranges. Leveraging these insights, we introduce a novel assumption-free and generalized fake speech detection framework. The framework partitions spectrograms into layered representations based on magnitude and detects artifacts across both spatial and discrete cosine transform (DCT) domains using 2D and 3D representations. This design enables the framework to effectively capture fine-grained artifacts and synthesis inconsistencies inherent in fake speech. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art performance on several widely used public audio deepfake datasets. Furthermore, evaluations in real-world scenarios involving black-box Web voice-cloning APIs highlight the framework's robustness and practical applicability, consistently outperforming baseline methods.
-
Jie Wang (Xidian University), Zheng Yan (Xidian University), Jiahe Lan (Xidian University), Xuyan Li (Xidian University), Elisa Bertino (Purdue University)
Trust prediction provides valuable support for decision-making, risk mitigation, and system security enhancement. Recently, Graph Neural Networks (GNNs) have emerged as a promising approach for trust prediction, owing to their ability to learn expressive node representations that capture intricate trust relationships within a network. However, current GNN-based trust prediction models face several limitations: (i) Most of them fail to capture trust dynamicity, leading to questionable inferences. (ii) They rarely consider the heterogeneous nature of real-world networks, resulting in a loss of rich semantics. (iii) None of them support context-awareness, a basic property of trust, making prediction results coarse-grained.
To this end, we propose CAT, the first underline{C}ontext-underline{A}ware GNN-based underline{T}rust prediction model that supports trust dynamicity and accurately represents real-world heterogeneity. CAT consists of a graph construction layer, an embedding layer, a heterogeneous attention layer, and a prediction layer. It handles dynamic graphs using continuous-time representations and captures temporal information through a time encoding function. To model graph heterogeneity and leverage semantic information, CAT employs a dual attention mechanism that identifies the importance of different node types and nodes within each type. For context-awareness, we introduce a new notion of meta-paths to extract contextual features. By constructing context embeddings and integrating a context-aware aggregator, CAT can predict both context-aware trust and overall trust. Extensive experiments on three real-world datasets demonstrate that CAT outperforms five groups of baselines in trust prediction, while exhibiting strong scalability to large-scale graphs and robustness against both trust-oriented and GNN-oriented attacks.
Cache Me If You Can
-
Guanlong Wu (Southern University of Science and Technology), Taojie Wang (Southern University of Science and Technology), Yao Zhang (ByteDance Inc.), Zheng Zhang (Southern University of Science and Technolog), Jianyu Niu (Southern University of Science and Technology), Ye Wu (ByteDance Inc.), Yinqian Zhang (SUSTech)
The emergence of large language models (LLMs) has enabled a wide range of applications, including code generation, chatbots, and AI agents. However, deploying these applications faces substantial challenges in terms of cost and efficiency. One notable optimization to address these challenges is semantic caching, which reuses query-response pairs across users based on semantic similarity. This mechanism has gained significant traction in both academia and industry and has been integrated into the LLM serving infrastructure of cloud providers such as Azure, AWS, and Alibaba.
This paper is the first to show that semantic caching is vulnerable to cache poisoning attacks, where an attacker injects crafted cache entries to cause others to receive attacker-defined responses. We demonstrate the semantic cache poisoning attack in diverse scenarios and confirm its practicality across all three major public clouds. Building on the attack, we evaluate existing adversarial prompting defenses and find they are ineffective against semantic cache poisoning, leading us to propose a new defense mechanism that demonstrates improved protection compared to existing approaches, though complete mitigation remains challenging. Our study reveals that cache poisoning, a long-standing security concern, has re-emerged in LLM systems. While our analysis focuses on semantic cache, the underlying risks may extend to other types of caching mechanisms used in LLM systems.
-
Zhifan Luo (Zhejiang University), Shuo Shao (Zhejiang University), Su Zhang (Huawei Technology), Lijing Zhou (Huawei Technology), Yuke Hu (Zhejiang University), Chenxu Zhao (Zhejiang University), Zhihao Liu (Zhejiang University), Zhan Qin (Zhejiang University)
The Key-Value (KV) cache, which stores intermediate attention computations (Key and Value pairs) to avoid redundant calculations, is a fundamental mechanism for accelerating Large Language Model (LLM) inference. However, this efficiency optimization introduces significant yet underexplored privacy risks. This paper provides the first comprehensive analysis of these vulnerabilities, demonstrating that an attacker can reconstruct sensitive user inputs directly from the KV-cache. We design and implement three distinct attack vectors: a direct Inversion Attack, a more broadly applicable and potent Collision Attack, and a semantic-based Injection Attack. These methods demonstrate the practicality and severity of KV-cache privacy leakage issues. To mitigate this, we propose KV-Cloak, a novel, lightweight, and efficient defense mechanism. KV-Cloak uses a reversible matrix-based obfuscation scheme, combined with operator fusion, to secure the KV-cache. Our extensive experiments show that KV-Cloak effectively thwarts all proposed attacks, reducing reconstruction quality to random noise. Crucially, it achieves this robust security with virtually no degradation in model accuracy and minimal performance overhead, offering a practical solution for trustworthy LLM deployment.
-
Yuxiao Wu (Institute for Network Sciences and Cyberspace, BNRist, Tsinghua University), Yunyi Zhang (Tsinghua University), Chaoyi Lu (Zhongguancun Laboratory), Baojun Liu (Tsinghua University; Zhongguancun Laboratory)
DNS cache poisoning attacks covertly hijack domain access by injecting forged resource records into resolvers. To counter this, resolvers employ bailiwick checking, a critical defense mechanism designed to filter potentially malicious records from DNS responses. However, in the context of third-party services, a misalignment between domain ownership and the traditional, top-down zone delegation model has emerged, posing significant challenges to the effectiveness of bailiwick checks.
In this paper, we present a systematic analysis of the design and implementation of bailiwick checking. We demonstrated that mainstream resolvers generally adopt a conservatism principle: textit{they will cache any resource record that satisfies minimal constraints, regardless of its direct relevance to the originating query}. Building on this finding, we propose a novel cache poisoning attack (termed Cuckoo Domain): by controlling one single subdomain, attackers can compromise its parent domain or its sibling domains. The results of our testing revealed that seven major DNS resolver implementations, including BIND9 and Microsoft DNS, are vulnerable. Through a large-scale measurement study, we confirmed that 44.64% of open resolvers and 21 major public DNS providers are also at risk. In addition, we found that over a million subdomains provided by 7 providers—including No-IP, ClouDNS, and Akamai—are potentially vulnerable to hijacking through this attack. We have conducted a responsible disclosure, reporting the affected software vendors and service providers. BIND9, Unbound, PowerDNS and Technitium have acknowledged our reports and assigned 3 CVEs. We call upon the community and software vendors to address the new challenges that modern service ecosystems pose to the effectiveness of bailiwick checking.
-
Claudio Migliorelli (IBM Research Europe - Zurich), Andrea Mambretti (IBM Research Europe - Zurich), Alessandro Sorniotti (IBM Research Europe - Zurich), Vittorio Zaccaria (Politecnico di Milano), Anil Kurmus (IBM Research Europe - Zurich)
Kernel memory allocators remain a critical attack surface, despite decades of research into memory corruption defenses. While recent mitigation strategies have diminished the effectiveness of conventional attack techniques, we show that robust cross-cache attacks are still feasible and pose a significant threat. In this paper, we introduce PCPLost, a cross-cache memory massaging technique that bypasses mainline mitigations by carefully using side channels to infer the kernel allocator's internal state. We demonstrate that vulnerabilities such as out-of-bounds (OOB) --- and, via pivoting, use-after-free (UAF) and double-free (DF) --- can be exploited reliably through a cross-cache attack, across all generic caches, even in the presence of noise. We validate the generality and robustness of our approach by exploiting 6 publicly disclosed CVEs by using PCPLost, and discuss possible mitigations. The significant reliability (over 90% in most cases) of our approach in obtaining a cross-cache layout suggests that current mitigation strategies fail to offer comprehensive protection against such attacks within the Linux kernel.
The Sound of (Meta)Data
-
Eden Luzon (Ben Gurion University of the Negev), Guy Amit (Ben-Gurion University & IBM Research), Roy Weiss (Ben Gurion University of the Negev), Torsten Krauß (University of Würzburg), Alexandra Dmitrienko (University of Würzburg), Yisroel Mirsky (Ben Gurion University of the Negev)
Neural networks are often trained on proprietary datasets, making them attractive attack targets. We present a novel dataset extraction method leveraging an innovative training-time backdoor attack, allowing a malicious federated learning (FL) server to systematically and deterministically extract complete client training samples through a simple indexing process. Unlike prior techniques, our approach guarantees exact data recovery rather than probabilistic reconstructions or hallucinations, provides precise control over which samples are memorized and how many, and shows high capacity and robustness. Infected models output data samples when they receive a pattern-based index trigger, enabling systematic extraction of meaningful patches from each client’s local data without disrupting global model utility. To address small model output sizes, we extract patches and then recombined them.
The attack requires only a minor modification to the training code that can easily evade detection during client-side verification. Hence, this vulnerability represents a realistic FL supply-chain threat, where a malicious server can distribute modified training code to clients and later recover private data from their updates. Evaluations across classifiers, segmentation models, and large language models demonstrate that thousands of sensitive training samples can be recovered from client models with minimal impact on task performance, and a client's entire dataset can be stolen after multiple FL rounds. For instance, a medical segmentation dataset can be extracted with only a 3% utility drop. These findings expose a critical privacy vulnerability in FL systems, emphasizing the need for stronger integrity and transparency in distributed training pipelines.
-
Xin'an Zhou (University of California, Riverside), Juefei Pu (University of California, Riverside), Zhutian Liu (University of California, Riverside), Zhiyun Qian (University of California, Riverside), Zhaowei Tan (University of California, Riverside), Srikanth V. Krishnamurthy (University of California, Riverside), Mathy Vanhoef (DistriNet, KU Leuven)
To prevent malicious Wi-Fi clients from attacking other clients on the same network, vendors have introduced client isolation, a combination of mechanisms that block direct communication between clients. However, client isolation is not a standardized feature, making its security guarantees unclear.
In this paper, we undertake a structured security analysis of Wi-Fi client isolation and uncover new classes of attacks that bypass this protection. We identify several root causes behind these weaknesses. First, Wi-Fi keys that protect broadcast frames are improperly managed and can be abused to bypass client isolation. Second, isolation is often only enforced at the MAC or IP layer, but not both. Third, weak synchronization of a client's identity across the network stack allows one to bypass Wi-Fi client isolation at the network layer instead, enabling the interception of uplink and downlink traffic of other clients as well as internal backend devices. Every tested router and network was vulnerable to at least one attack. More broadly, the lack of standardization leads to inconsistent, ad hoc, and often incomplete implementations of isolation across vendors.
Building on these insights, we design and evaluate end-to-end attacks that enable full machine-in-the-middle capabilities in modern Wi-Fi networks. Although client isolation effectively mitigates legacy attacks like ARP spoofing, which has long been considered the only universal method for achieving machine-in-the-middle positioning in local area networks, our attack introduces a general and practical alternative that restores this capability, even in the presence of client isolation.
-
Jiacen Xu (Microsoft), Chenang Li (University of California, Irvine), Yu Zheng (University of California, Irvine), Zhou Li (University of California, Irvine)
Graph-based Network Intrusion Detection Systems (GNIDS) have gained significant momentum in detecting sophisticated cyber-attacks, such as Advanced Persistent Threats (APTs), within and across organizational boundaries. Though achieving satisfying detection accuracy and demonstrating adaptability to ever-changing attacks and normal patterns, existing GNIDS predominantly assume a centralized data setting. However, flexible data collection is not always realistic or achievable due to increasing constraints by privacy regulations and operational limitations.
We argue that the practical development of GNIDS requires accounting for distributed collection settings and we leverage Federated Learning (FL) as a viable paradigm to address this prominent challenge. We observe that naively applying FL to GNIDS is unlikely to be effective, due to issues like graph heterogeneity over clients and the diverse design choices taken by different GNIDS. We address these issues with a set of novel techniques tailored to the graph datasets, including reference graph synthesis, graph sketching and adaptive contribution scaling, eventually developing a new system ENTENTE. By leveraging the domain knowledge, ENTENTE can achieve effectiveness, scalability and robustness simultaneously. Empirical evaluation on the large-scale LANL, OpTC and Pivoting datasets shows that ENTENTE outperforms the SOTA FL baselines. We also evaluate ENTENTE under FL poisoning attacks tailored to the GNIDS setting, showing the robustness by bounding the attack success rate to low values. Overall, our study suggests a promising direction to build cross-silo GNIDS.
-
Chenxiang Luo (City University of Hong Kong), David Yau (Singapore University of Technology and Design), Qun Song (City University of Hong Kong)
Federated learning (FL) enables collaborative model training without sharing raw data but is vulnerable to gradient inversion attacks (GIAs), where adversaries reconstruct private data from shared gradients. Existing defenses either incur impractical computational overhead for embedded platforms or fail to achieve privacy protection and good model utility at the same time. Moreover, many defenses can be easily bypassed by adaptive adversaries who have obtained the defense details. To address these limitations, we propose SVDefense, a novel defense framework against GIAs that leverages the truncated Singular Value Decomposition (SVD) to obfuscate gradient updates. SVDefense introduces three key innovations, a Self-Adaptive Energy Threshold that adapts to client vulnerability, a Channel-Wise Weighted Approximation that selectively preserves essential gradient information for effective model training while enhancing privacy protection, and a Layer-Wise Weighted Aggregation for effective model aggregation under class imbalance. Our extensive evaluation shows that SVDefense outperforms existing defenses across multiple applications, including image classification, human activity recognition, and keyword spotting, by offering robust privacy protection with minimal impact on model accuracy. Furthermore, SVDefense is practical for deployment on various resource-constrained embedded platforms. We will make our code publicly available upon paper acceptance.
The Permission: Impossible
-
Simeon Hoffmann (CISPA Helmholtz Center for Information Security), Nils Ole Tippenhauer (CISPA Helmholtz Center for Information Security)
In embedded systems, the integration of multiple CPUs into one system on a chip (SoC) allows greater performance, and separation of tasks into independent firmwares and optimized architectures. For example, an ARM Cortex-M4 core could run the main firmware, and a Cortex-M0 core could run a real-time operating system (RTOS). Security implications of such integrations are still unclear, e.g. if an attacker with code execution on one CPU can fully compromise the second CPU, or leak protected data.
In this work, we systematically identify security issues resulting from this integration, in particular related to memory and peripheral access control. These issues stem from re-use of single-CPU security mechanisms such as memory protection units (MPUs) in the new multi-CPU system. We identify four major attack vectors that can be present in such systems, and find that a significant number of systems on the market appear to be vulnerable. The attack vectors can lead to arbitrary read and write in protected memory of the other CPU, and even to code execution. In addition, we find that the communication mechanism of a popular open source RTOS, FreeRTOS [17], which is suggested as communication mechanism among firmwares on a multi-CPU system, introduces code execution vulnerabilities in the multi-CPU scenario. Then, we verify our theoretical predictions by implementing four attack vectors and demonstrate their practical efficacy. In addition, we find that in one case, the discovered attack surface may lead to the compromise of a custom trusted
execution environment (TEE) implementation. We responsibly disclosed our findings to the vendors, resulting in a security advisory and a fix to a proprietary network stack implementation. -
Manuel Andreas (Technical University of Munich), Fabian Specht (Technical University of Munich), Marius Momeu (Technical University of Munich)
Hypervisors are crucial for the security and availability of modern cloud infrastructures, yet they must expose a large virtualization interface to guest VMs---an attack surface that adversaries can exploit. Among the most intricate and security-sensitive components of hypervisors is their virtual CPU implementation, typically implemented at the highest privilege level. Although previous fuzzing research made promising steps towards scrutinizing the virtual CPU component of HVs, existing techniques fail at covering it in depth, as its convoluted nature requires laborious manual setup for accessing individual interfaces, all the while employing sub-optimal techniques that lower fuzzing throughput.
We address these shortcomings via HyperMirage, a novel hypervisor fuzzer that automatically and efficiently explores the large space of architectural states emulated by virtual CPU implementations. HyperMirage spares security analysts from manually crafting fuzzing seeds in the form of architecturally valid VM states by employing a novel Direct State Manipulation approach, which directly and automatically mutates the HV's view of a VM's state that is consumed during fuzzing. Additionally, we extend a state-of-the-art compiler-based symbolic execution engine, making it the first one available for bare-metal targets, and integrate it into an efficient coverage-guided HV fuzzer, enabling HyperMirage to drastically improve fuzzing throughput when compared to existing techniques.
We provide a case study of HyperMirage by fuzzing the production-grade Xen and KVM hypervisors on the Intel x86 architecture. Our evaluation shows that HyperMirage is capable of covering $200%$ more virtual CPU interfaces than prior work and achieves drastically more coverage on the entire virtual CPU space when compared to available HV fuzzers. Moreover, HyperMirage discovered 9 new bugs in Xen and 2 in KVM, all of which have been confirmed by the respective project maintainers.
-
Yingjie Cao (The Hong Kong Polytechnic University), Xiaogang Zhu (The University of Adelaide), Dean Sullivan (University of New Hampshire), Haowei Yang (360 Security Technology Inc.), Lei Xue (Sun Yat-sen University), Xian Li (Swinburne University of Technology), Chenxiong Qian (University of Hong Kong), Minrui Yan (Swinburne University of Technology), Xiapu Luo (The Hong Kong Polytechnic University)
Double-fetch vulnerabilities arise when the kernel repeatedly retrieves data from user-space memory without ensuring consistency between the successive data fetches. This issue is especially severe in Real-Time Operating Systems (RTOS), where strict timing requirements limit the use of synchronization mechanisms like mutexes, thus favoring low-latency memory access at the cost of security. Most current detection techniques use static source code analysis, which cannot be applied to commercial off-the-shelf (COTS) RTOS with proprietary kernels. Dynamic methods that employ heuristic time-window thresholds to detect repeated cross-boundary memory accesses are used instead. However, these methods often produce a high number of false positives due to overly broad pattern recognition and lead to significant emulation overhead.
We introduce IsolatOS, a hardware-supported detection method that utilizes kernel isolation features to spot cross-boundary memory accesses that indicate double-fetch vulnerabilities. The main difficulty is in maintaining transparency while enforcing isolation boundaries without causing crashes in RTOS systems to boost efficiency. IsolatOS overcomes this by first implementing dynamic instrumentation that intercepts privileged accesses to user memory, recording metadata about accesses. Then exception recovery techniques upholds system stability during fault handling. At post-execution stage, the causal analysis examines violation traces to differentiate between legitimate dual accesses and exploitable double-fetches.
Evaluations across QNX, VxWorks, and seL4 demonstrate efficiency of IsolatOS, 70× runtime overhead reduction compared to emulation-based approach, identification of 42 unique vulnerabilities (39 vendor-confirmed, 2 CVEs assigned). These results validate hardware-assisted kernel isolation is a viable paradigm for double-fetch detection in COTS RTOS environments. We also demonstrate the real-world impact of our findings in automotive systems by exploiting them.
-
Jiayi Hu (Zhejiang University), Qi Tang (Jilin University), Xingkai Wang (Zhejiang University), Jinmeng Zhou (Zhejiang University), Rui Chang (Zhejiang University), Wenbo Shen (Zhejiang University)
Graphics Processing Units (GPUs) have become essential components in modern computing, driving high performance rendering and parallel processing. Among them, Arm’s Mali GPU is the most widely deployed in mobile devices. In contrast to the mature and robust defenses on the CPU side, the GPU remains poorly protected. Consequently, GPUs have become a preferred target for attackers seeking to bypass CPU defenses. Notable incidents, such as Operation Triangulation, have demonstrated how GPU-side vulnerabilities can be exploited to compromise system security. Despite the rising threat, the comprehensive and in-depth security analysis of the Mali GPU is still missing.
To address this gap, we conduct the first in-depth security analysis of Mali GPU’s memory mapping mechanism and uncover two new security weaknesses: allocation–mapping decoupling and missing physical address validation. Exploiting these weaknesses, we introduce PhantomMap, a novel GPU-assisted exploitation technique that transforms limited heap vulnerabilities into powerful physical memory read/write primitives—bypassing mainstream kernel defenses without requiring privileged capabilities or information leaks. To assess its security impact, we develop a static analyzer that systematically identifies all vulnerable mapping paths, uncovering 15 exploit chains across two Mali driver architectures. We further demonstrate PhantomMap’s practicality by developing 15 end-to-end exploits based on real-world CVEs, including the first public exploit for CVE-2025-21836. Finally, we design and implement a lightweight in-driver mitigation that eliminates the root cause with minimal performance overhead on Pixel 6 and Pixel 7 devices.
AIs Wide Shut
-
Minkyung Park (University of Texas at Dallas), Zelun Kong (University of Texas at Dallas), Dave (Jing) Tian (Purdue University), Z. Berkay Celik (Purdue University), Chung Hwan Kim (University of Texas at Dallas)
Deep neural networks (DNNs) are integral to modern computing, powering applications such as image recognition, natural language processing, and audio analysis. The architectures of these models (e.g., the number and types of layers) are considered valuable intellectual property due to the significant expertise and computational effort required for their design. Although trusted execution environments (TEEs) like Intel SGX have been adopted to safeguard these models, recent studies on model extraction attacks have shown that side-channel attacks (SCAs) can still be leveraged to extract the architectures of DNN models. However, many existing model extraction attacks either do not account for TEE protections or are limited to specific model types, reducing their real-world applicability.
In this paper, we introduce DNN Latency Sequencing (DLS), a novel model extraction attack framework that targets DNN architectures running within Intel SGX enclaves. DLS employs SGX-Step to single-step model execution and collect fine-grained latency traces, which are then analyzed at both the function and basic block levels to reconstruct the model architecture. Our key insight is that DNN architectures inherently influence execution behavior, enabling accurate reconstruction from latency patterns. We evaluate DLS on models built with three widely used deep learning libraries, Darknet, TensorFlow Lite, and ONNX Runtime, and show that it achieves architecture recovery accuracies of 97.3%, 96.4%, and 93.6%, respectively. We further demonstrate that DLS enables advanced attacks, highlighting its practicality and effectiveness.
-
Rui Xiao (Zhejiang University), Sibo Feng (Zhejiang University), Soundarya Ramesh (National University of Singapore), Jun Han (KAIST), Jinsong Han (Zhejiang University)
As deep neural networks (DNNs) are increasingly adopted in safety-critical applications such as autonomous driving and face recognition, they have also become targets for adversarial attacks. However, confidential information of DNNs -- including model architecture -- is typically hidden from attackers. As a result, adversarial attacks are often launched in black-box settings, which limits their effectiveness. In this paper, we propose ModelSpy, a stealthy DNN architecture snooping attack based on GPU electromagnetic (EM) leakage. ModelSpy is capable of extracting complete architecture from several meters away, even through walls. ModelSpy is based on the key observation that GPU emanates far-field EM signals that exhibit architecture-specific amplitude modulation during DNN inference. We develop a hierarchical reconstruction model to recover fine-grained architectural details from the noisy EM signals. To enhance scalability across diverse and evolving architectures, we design a transfer-learning scheme by exploiting the correlation between external EM leakage and internal GPU activity. We design and implement a proof-of-concept system to demonstrate ModelSpy's feasibility. Our evaluation on five high-end consumer GPUs shows ModelSpy's high accuracy in architecture reconstruction, including 97.6% in layer segmentation and 94.0% in hyperparameter estimation, with a working distance of up to 6~m. Furthermore, ModelSpy's reconstructed DNN shows comparable performance to victim architecture, and can effectively enhance black-box adversarial attacks.
-
David Oygenblik (Georgia Institute of Technology), Dinko Dermendzhiev (Georgia Institute of Technology), Filippos Sofias (Georgia Institute of Technology), Mingxuan Yao (Georgia Institute of Technology), Haichuan Xu (Georgia Institute of Technology), Runze Zhang (Georgia Institute of Technology), Jeman Park (Kyung Hee University), Amit Kumar Sikder (Iowa State University), Brendan Saltaformaggio (Georgia Institute of Technology)
Prior work has developed techniques capable of extracting deep learning (DL) models in universal formats from system memory or program binaries for security analysis. Unfortunately, such techniques ignore the recovery of the DL model's programmatic representation required for model reuse and any white-box analysis techniques. Addressing this, we propose a novel recovery methodology, and prototype ZEN, that automatically recovers the DL model programmatic representation complementing the recovery of the mathematical representation by prior work. ZEN identifies novel code in an unknown DL system relative to a base model and generates patches such that the recovered DL model can be reused. We evaluated ZEN on 21 SOTA DL models, including models across the language and vision domains, such as Llama 3 and YoloV10. ZEN successfully attributed custom models to their base models with 100% accuracy, enabling model reuse.
Key Largo
-
Christopher Vattheuer (UCLA), Justin Feng (UCLA), Hossein Khalili (UCLA), Nader Sehatbakhsh (UCLA), Omid Abari (UCLA)
As Extended Reality (XR) technology continues to integrate into diverse fields, various security vulnerabilities—such as keystroke inference (keylogging)—have become a growing concern. Several keylogging attacks demonstrate the feasibility of exploiting this vulnerability using different modalities including voice and vision. These attacks, however, are often constrained by the need for line of sight (LoS) and/or close proximity (<10 meters). We propose a novel keylogging attack on XR devices leveraging WiFi wireless sensing. Unlike prior methods, our attack does not require LoS and is effective across various scenarios, including long-distance, cross-building settings (up to 30 meters). Our attack requires only a single, cheap, pocket-sized receiving setup to collect the victim's WiFi packets. Compared to previous keylogging attacks leveraging WiFi, our approach is the first to eliminate the need for a separate transmitter and receiver or a fake hotspot. As a result, unlike prior methods, our attack is effective even at large distances. The core idea hinges on exploiting a security vulnerability in WiFi chipsets. This vulnerability allows an attacker to send a fake, unencrypted packet to the victim's device where, in response, the victim's device involuntarily and automatically transmits an acknowledgment (``ACK'') packet. By leveraging this mechanism, we can continuously force the headset's WiFi chipset to transmit packets and therefore harvest large volumes of Channel State Information (CSI) data from the victim's headset. We then develop a novel unsupervised signal processing algorithm to exploit CSI data to perform pose estimation and locate the victim's hands and fingers, ultimately enabling keystroke inference. We evaluate our attack on textit{Meta Quest 2} and textit{Meta Quest 3} cite{metaquest2, metaquest3} headsets under diverse conditions, including distances ranging from 1 meter to 30 meters, angles spanning from -90° to +90°, multiple users, and through-wall scenarios, demonstrating its robustness and effectiveness across a wide range of environments. Our attack achieves 78.6% top-25 accuracy across a building on passwords up to 15 characters long.
-
Kunlin Cai (University of California, Los Angeles), Jinghuai Zhang (University of California, Los Angeles), Ying Li (University of California, Los Angeles), Zhiyuan Wang (University of Virginia), Xun Chen (Independent Researcher), Tianshi Li (Northeastern University), Yuan Tian (University of California, Los Angeles)
The immersive nature of XR introduces a fundamentally different set of security and privacy (S&P) challenges due to the unprecedented user interactions and data collection that traditional paradigms struggle to mitigate. As the primary architects of XR applications, developers play a critical role in addressing novel threats. However, to effectively support developers, we must first understand how they perceive and respond to different threats. Despite the growing importance of this issue, there is a lack of in-depth, threat-aware studies that examine XR S&P from the developers’ perspective. To fill this gap, we interviewed 23 professional XR developers with a focus on emerging threats in XR. Our study addresses two research questions aiming to uncover existing problems in XR development and identify actionable paths forward.
By examining developers' perceptions of S&P threats, we found that: (1) XR development decisions (e.g., rich sensor data collection, user-generated content interfaces) are closely tied to and can amplify S&P threats, yet developers are often unaware of these risks, resulting in cognitive biases in threat perception; and (2) limitations in existing mitigation methods, combined with insufficient strategic, technical, and communication support, undermine developers' motivation, awareness, and ability to effectively address these threats.
Based on these findings, we propose actionable and stakeholder-aware recommendations to improve XR S&P throughout the XR development process. This work represents the first effort to undertake a threat-aware, developer-centered study in the XR domain—an area where the immersive, data-rich nature of the XR technology introduces distinctive challenges. -
Yan He (University of Oklahoma), Guanchong Huang (University of Oklahoma), Song Fang (University of Oklahoma)
Wireless security surveillance systems are widely deployed due to their increased affordability. Motion detection is often integrated into them as the linchpin of the security they provide, detecting when someone is present in its range and then triggering the system to start recording or notifying the property owner. In this paper, we present PhantomMotion, a new attack framework to fool the motion detection function of those security systems. It can create fake motion stimuli stealthily by aiming laser beams into the motion detection range, and it confirms a response to the stimuli via sniffing wireless traffic. PhantomMotion does not require any professional equipment or to perform physical motion within the monitored area. It consists of a novel hardware platform integrating laser control and WiFi sniffing, and a new generative mechanism of motion injection. We develop a smartphone app to implement PhantomMotion, validating its efficacy against 18 popular wireless motion-activated security systems. Experimental results show that PhantomMotion can always generate fake motion to successfully trigger the systems, within an average of 12.8 seconds and via moving the laser spot for a mean distance of 1.1 m. Notably, we verify that PhantomMotion works from a distance of up to 120 meters.
-
Giacomo Longo (University School of Advanced Defense Studies), Giacomo Ratto (University School of Advanced Defense Studies), Alessio Merlo (University School of Advanced Defense Studies), Enrico Russo (University of Genova)
The Traffic alert and Collision Avoidance System (TCAS) is a mandatory last-resort safeguard against mid-air collisions. Despite its critical safety role, the system's unauthenticated and unencrypted communication protocols present a long-identified security risk. Although researchers have previously demonstrated practical injection attacks, official advisories have assessed these vulnerabilities as confined to laboratory environments, also stating that no mitigation is currently available. In this paper, we challenge both assertions. We present compelling evidence suggesting that an in-flight cyber-attack targeting TCAS has already occurred. Through a detailed analysis of public flight and communications data from a series of anomalous events involving multiple aircraft, we identify a distinct signature consistent with a ghost plane injection attack. We detail how this novel attack exploits legacy protocol features and describe three strategies of increasing sophistication; the most aggressive of these can reduce a target's perceived range by over 3.5 kilometers, sufficient to trigger collision avoidance advisories on victim aircraft from a significant standoff distance. We implement and experimentally evaluate the attack strategy most consistent with the observed incident, achieving a spoofed range reduction of 1.9 km, confirming its feasibility. Furthermore, to provide a basis for responding to such threats, we propose a novel, backward-compatible methodology to geographically localize the source of such attacks by repurposing the TCAS alert data broadcast by victims. In simulated scenarios of the most plausible attack variant, our approach achieves a median localization accuracy of 855 meters. Applying this technique to real-world incident data, we were able to identify the anomaly and the likely origin of the observed ghost plane injection attack.
Lord of the Pings
-
Saisai Xia (Institute of Information Engineering, CAS), Wenhao Wang (Institute of Information Engineering, CAS), Zihao Wang (Nanyang Technological University (NTU)), Yuhui Zhang (Institute of Information Engineering, CAS), Yier Jin (University of Science and Technology of China), Dan Meng (Institute of Information Engineering, CAS), Rui Hou (Institute of Information Engineering, CAS)
Publicly available large pretrained models (i.e., backbones) and lightweight adapters for parameter-efficient fine-tuning (PEFT) have become standard components in modern machine learning pipelines. However, preserving the privacy of both user inputs and fine-tuned adapters---often trained on sensitive data---during inference remains a significant challenge. Applying cryptographic techniques, such as multi-party computation (MPC), to PEFT settings still incurs substantial encrypted computation across both the backbone and adapter, mainly due to the inherent two-way communication between them. To address this limitation, we propose CryptPEFT, the first PEFT solution specifically designed for private inference scenarios. CryptPEFT introduces a novel emph{one-way communication (OWC)} architecture that confines encrypted computation solely to the adapter, significantly reducing both computational and communication overhead. To maintain strong model utility under this constraint, we explore the design space of OWC-compatible adapters and employ an automated architecture search algorithm to optimize the trade-off between private inference efficiency and model utility. We evaluated CryptPEFT using Vision Transformer backbones across widely used image classification datasets. Our results show that CryptPEFT significantly outperforms existing baselines, delivering speedups ranging from $20.62times$ to $291.48times$ in simulated wide-area network (WAN) and local-area network (LAN) settings. On CIFAR-100, CryptPEFT attains 85.47% accuracy with just 2.26 seconds of inference latency. These findings demonstrate that CryptPEFT offers an efficient and privacy-preserving solution for modern PEFT-based inference.
-
Wei Xu (Xidian University), Hui Zhu (Xidian University), Yandong Zheng (Xidian University), Song Bian (Beihang University), Ning Sun (Xidian University), Yuan Hao (Xidian University), Dengguo Feng (School of Cyber Science and Technology), Hui Li (Xidian University)
With the rapid adoption of Models-as-a-Service, concerns about data and model privacy have become increasingly critical. To solve these problems, various privacy-preserving inference schemes have been proposed. In particular, due to the efficiency and interpretability of decision trees, private decision tree evaluation (PDTE) has garnered significant attention. However, existing PDTE schemes suffer from significant limitations: their communication and computation costs scale with the number of trees, the number of nodes, or the tree depth, which makes them inefficient for large-scale models, especially over WAN networks. To address these issues, we propose Kangaroo, a private and amortized decision tree inference framework build upon packed homomorphic encryption. Specifically, we design a novel model hiding and encoding scheme, together with secure feature selection, oblivious comparison, and secure path evaluation protocols, enabling full amortization of the overhead as the number of nodes or trees scales. Furthermore, we enhance the performance and functionality of the framework through optimizations, including same-sharing-for-same-model, latency-aware, and adaptive encoding adjustment strategies. Kangaroo achieves a $14times$ to $59times$ performance improvement over state-of-the-art (SOTA) one-round interactive schemes in WAN environments. For large-scale decision tree inference tasks, it delivers a $3times$ to $44times$ speedup compared to existing schemes. Notably, Kangaroo enables the evaluation of a random forest with $969$ trees and $411825$ nodes in approximately $60$ ms per tree (amortized) under WAN environments.
-
Hexuan Yu (Virginia Tech), Chaoyu Zhang (Virginia Tech), Yang Xiao (University of Kentucky), Angelos D. Keromytis (Georgia Institute of Technology), Y. Thomas Hou (Virginia Polytechnic Institute and State University), Wenjing Lou (Virginia Tech)
Mobile Network Operators (MNOs) are known to leak or sell subscribers’ sensitive information, including geolocation and communication histories. Anonymous mobile user authentication methods, such as cite{schmitt2021pretty} (USENIX Sec'21),~cite{yu2023aaka} (NDSS'24), cite{alnashwan2024strong} (CCS'24), enable users to access mobile networks without revealing long-term identifiers like phone numbers or Subscription Permanent Identifiers (SUPI). However, the absence of identity transparency and location awareness poses significant challenges to implementing anonymous access in real-world mobile networks, particularly for essential functions such as call routing, usage measurement, and charging. To address these limitations, we propose ANONYCALL, a privacy-preserving call management architecture that supports anonymous mobile network access while enabling two essential functions: textit{anonymous callee discovery} and textit{usage-based charging}. ANONYCALL incorporates an out-of-band authentication mechanism to securely share temporary call identifiers, allowing seamless call routing without revealing permanent user information. Additionally, it introduces an anonymous but accountable balance credential that enables accurate charging and prevents double-spending while preserving mobile user anonymity. Fully compatible with existing mobile networks, ANONYCALL introduces minimal overhead, adding less than 200 ms to call establishment. Evaluations with smartphones and standard calling systems demonstrate its practicality, offering a viable solution for privacy-preserving yet functional mobile communication.
-
Rob Jansen (U.S. Naval Research Laboratory)
Website fingerprinting is a privacy attack in which an adversary applies machine learning to predict the website a user visits through Tor. Recent work proposes evaluating WF attacks using the "genuine" patterns or traces of Tor users' natural interactions that can be measured by Tor exit relays, but these traces do not accurately reflect the patterns that an entry-side WF attacker would observe. In this paper, we present new methods for transducing exit traces into entry traces that we can use to more accurately estimate the risk WF poses to real Tor users. Our methods leverage trace timestamps and metadata to extract multiple round-trip time estimates and use them to "shift" traces to the perspective of a target vantage point. We show through extensive evaluation that our methods outperform the state of the art across multiple synthetic and genuine datasets and are considerably more efficient; they enable researchers to more accurately represent the real-world challenge facing an entry-side WF adversary, and produce augmented datasets that allow an adversary to boost the performance of existing WF attacks.
Password? I Hardly Know Her!
-
Junkyu Kang (KAIST), Soyoung Lee (KAIST), Yonghwi Kwon (University of Maryland), Sooel Son (KAIST)
Mobile messaging apps have become an integral part of daily communication with massive user bases (e.g., over 950 million on Telegram and 48.7 million on KakaoTalk). To boost user engagement and user base, messaging apps offer diverse context-rich and platform-specific features, such as nearby user search, contact discovery, and single sign-on (SSO)-based account linking. While these features enable users to adopt multiple messaging apps on a single mobile device, they also introduce privacy risks of linking private user information across multiple message apps, which remains understudied.
This paper presents an in-depth analysis of privacy threats in widely used messaging apps in South Korea, including Kakao- Talk, Telegram, WhatsApp, Signal and Tinder, demonstrating concrete attacks exploiting their contact discovery, SSO-based account linking, and nearby user search features to compromise user privacy. More importantly, we chain the attacks to conduct the first cross-platform linking attack, which enables adversaries to deanonymize user names and infer users’ physical locations with an average error margin of 324 meters for a large number of untargeted and targeted users. Our findings highlight that securing contact discovery is crucial as permissive contact discovery policies allow adversaries to exploit phone numbers and profile images as linking keys to connect private user information across multiple messaging apps. We discuss and propose mitigation strategies to alleviate the presented threats.
-
Gabriel Karl Gegenhuber (University of Vienna), Philipp Frenzel (SBA Research), Maximilian Günther (University of Vienna), Johanna Ullrich (University of Vienna), Aljosha Judmayer (University of Vienna)
WhatsApp, with 3.5 billion active accounts as of early 2025, is the world's largest instant messaging platform. Given its massive user base, WhatsApp plays a critical role in global communication.
To initiate conversations, users must first discover whether their contacts are registered on the platform. This is achieved by querying WhatsApp's servers with mobile phone numbers extracted from the user’s address book (if they allowed access). This architecture inherently enables phone number enumeration, as the service must allow legitimate users to query contact availability. While rate limiting is a standard defense against abuse, we revisit the problem and show that WhatsApp remains highly vulnerable to enumeration at scale.
In our study, we were able to probe over a hundred million phone numbers per hour without encountering blocking or effective rate limiting.Our findings demonstrate not only the persistence but the severity of this vulnerability. We further show that nearly half of the phone numbers disclosed in the 2021 Facebook data leak are still active on WhatsApp, underlining the enduring risks associated with such exposures. Moreover, we were able to perform a census of WhatsApp users, providing a glimpse on the macroscopic insights a large messaging service is able to generate even though the messages themselves are end-to-end encrypted. Using the gathered data, we also discovered the re-use of certain X25519 keys across different devices and phone numbers, indicating either insecure (custom) implementations, or fraudulent activity.
-
Xin Zhang (Fudan University), Xiaohan Zhang (Fudan University), Huijun Zhou (Fudan University), Bo Zhao (Fudan University)
Cross-device authentication (XDAuth) has become an essential mechanism for seamless account access across multiple devices. In this paradigm, a user can sign in on one device (the target device) by completing authentication on another trusted device (the authentication device) that holds an active session or stored credentials, improving user experience. However, the decoupling of the authentication device and target device introduces new risks: the physical and contextual separation disrupts the usual authentication flow, creates information asymmetry, and makes it hard for users to assess the legitimacy of an authentication request. Consequently, users may inadvertently approve malicious logins and face account compromise, especially when key contextual details, explicit confirmation, or revocation mechanisms are missing.
To address these risks, we start from a user-centric perspective grounded in three fundamental user rights: the right to know, the right to consent, and the right to control, to safeguard the security and usability of XDAuth systems. We investigate how these rights are supported in practice by examining 27 major services spanning three typical XDAuth schemes. Our findings are concerning: over half of the services do not provide any information about the target device during authentication, not all services enforce explicit user confirmation, and six lack a way to revoke suspicious authorizations. We responsibly disclosed these issues to the affected vendors, several of whom acknowledged the problems and responded positively. We further conduct a user study with 100 participants, uncovering that the vast majority consider these rights essential and expect them to be upheld in XDAuth. Our study reveals a clear gap between current implementations and user expectations, underscoring the need for stronger user rights support to develop more secure, user-centered XDAuth.
-
Hongyu Lin (Zhejiang University), Yicheng Hu (Zhejiang University), Haitao Xu (Zhejiang University), Yanchen Lu (Zhejiang University), Mengxia Ren (Zhejiang University), Shuai Hao (Old Dominion University), Chuan Yue (Colorado School of Mines), Zhao Li (Hangzhou Yugu Technology), Fan Zhang (Zhejiang University), Yixin Jiang (Electric Power Research Institute)
Chameleon apps evade iOS App Store review by presenting legitimate functionality during submission while transforming into illicit variants post-installation. While prevalent, their underlying transformation methods and developer-user collusion dynamics remain poorly understood. Existing detection approaches, constrained by static analysis or metadata dependencies, prove ineffective against hybrid implementations, novel variants, or metadata-scarce instances. To address these limitations, we establish a curated dataset of 500 iOS Chameleon apps collected through covert distribution channels, enabling systematic identification of 10 categories of distinct transformation patterns (including 4 previously undocumented variants). Building upon these findings, we present ChameleoScan, the first LLM-driven automated UI exploration framework for reliable Chameleon app verification. The system maintains local decision interpretability while ensuring global detection consistency through its core innovation - predictive metadata analytics, semantic interface comprehension, and human-comparable interaction strategies. Comprehensive evaluation on 1,644 iOS apps demonstrates operational efficacy (9.85% detection rate, 92.59% precision), with findings formally acknowledged by Apple. Implementation and datasets are available at https://github.com/ChameleoScan.
Trust Without Disclosure
-
Jonas Hofmann (Technische Universität Darmstadt), Philipp-Florens Lehwalder (Technische Universität Darmstadt), Shahriar Ebrahimi (Alan Turing Institute), Parisa Hassanizadeh (IPPT PAN / University of Warwick), Sebastian Faust (Technische Universität Darmstadt)
Remote attestation is a fundamental security mechanism for assessing the integrity of remote devices. In practice, widespread adoption of attestation schemes is hindered by a lack of public verifiability and the requirement for interaction in existing protocols. A recent work by Ebrahimi et al. (NDSS'24) constructs publicly verifiable, non-interactive remote attestation, disregarding another important requirement for attesting sensitive systems: privacy protection. Similar needs arise in IoT swarms, where many devices, potentially processing sensitive data, should produce a single attestation.
In this paper, we take on both challenges. We present PIRANHAS, a publicly verifiable, asynchronous, and anonymous attestation scheme for individual devices and swarms. We leverage zk-SNARKs to transform any classical, symmetric remote attestation scheme into a non-interactive, publicly verifiable, and anonymous one. Verifiers only ascertain the validity of the attestation, without learning any identifying information about the involved devices.
For IoT swarms, PIRANHAS aggregates attestation proofs for the entire swarm using recursive zk-SNARKs. Our system supports arbitrary network topologies and allows nodes to dynamically join and leave the network. We provide formal security proofs for the single-device and swarm setting, showing that our construction meets the desired security guarantees. Further, we provide an open-source implementation of our scheme using the Noir and Plonky2 framework, achieving an aggregation runtime of just 356ms.
-
Andrija Novakovic (Bain Capital Crypto), Alireza Kavousi (University College London), Kobi Gurkan (Bain Capital Crypto), Philipp Jovanovic (University College London)
This work introduces Cryptobazaar, a scalable, private, and decentralized sealed-bid auction protocol. In particular, our protocol protects the privacy of losing bidders by preserving the confidentiality of their bids while ensuring public verifiability of the outcome and relying only on a single untrusted auctioneer for coordination. At its core, Cryptobazaar combines an efficient distributed protocol to compute the logical-OR for a list of unary-encoded bids with various novel zero-knowledge succinct arguments of knowledge that may be of independent interest. We present protocol variants that can be used for efficient first-, second-, and more generally (p+1)st-price as well as sequential first-price auctions. Finally, the performance evaluation of our Cryptobazaar implementation shows that it is highly practical. For example, a single auction run with 128 bidders and a price range of 1024 values terminates in less than 0.5sec and requires each bidder to send and receive only about 32KB of data.
-
Robin Vassantlal (LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal), Hasan Heydari (LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal), Bernardo Ferreira (LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal), Alysson Bessani (LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal)
It is well known that encryption alone is not enough to protect data privacy. Access patterns, revealed when operations are performed, can also be leveraged in inference attacks. Oblivious RAM (ORAM) hides access patterns by making client requests oblivious. However, existing protocols are still limited in supporting concurrent clients and Byzantine fault tolerance (BFT). We present MVP-ORAM, the emph{first wait-free ORAM protocol} that supports concurrent fail-prone clients. In contrast to previous works, MVP-ORAM avoids using trusted proxies, which necessitate additional security assumptions, and concurrency control mechanisms based on inter-client communication or distributed locks, which limit overall throughput and the capability to tolerate faulty clients. Instead, MVP-ORAM enables clients to perform concurrent requests and merge conflicting updates as they happen, satisfying wait-freedom, i.e., clients make progress emph{independently of the performance or failures of other clients}. Since wait and collision freedom are fundamentally contradictory goals that cannot be achieved simultaneously in an asynchronous concurrent ORAM service, we define a weaker notion of obliviousness that depends on the application workload and number of concurrent clients, and prove MVP-ORAM is emph{secure in practical scenarios where clients perform skewed block accesses}. By being wait-free, MVP-ORAM can be seamlessly integrated into existing confidential BFT data stores, creating the first BFT ORAM construction. We implement MVP-ORAM on top of a confidential BFT data store and show emph{our prototype can process hundreds of 4KB accesses per second} in modern clouds.
-
Renata Vaderna (Independent Researcher), Dušan Nikolić (University of Novi Sad), Patrick Zielinski (New York University), David Greisen (Open Law Library), BJ Ard (University of Wisconsin–Madison), Justin Cappos (New York University)
The digital age has caused more and more services to be accessible online. A key exception to this has been access to the law, which remains published on paper or aging online platforms. Jurisdictions that have adopted digital law platforms often face difficulties with ensuring the security of their law online.
In this paper, we introduce TAF, a system designed to secure legal repositories against unauthorized changes, and ensure the integrity of the law. Unlike prior archival or update frameworks, it is the first system that is designed for a threat model where an attacker fully controls the hosting repository. It also binds each signed repository state to publisher-defined legal dates, enabling verifiable as-of-date retrieval. First, TAF enables a repository of legal documents to remain accessible and authenticatable, no matter how long has passed since its publication. Second, TAF enables the independent verification of changes to a legal repository by anyone with read access to the repository. Third, TAF remains usable by users without a technical background or knowledge of cybersecurity.
TAF builds on the software-update guarantees of TUF, the version-control structure of Git, and a strong notion of time, where time is treated as signed data bound to specific repository states. TAF transforms the entire evolution of legal documents into an authenticatable, timestamped sequence of states, ensuring that every version, past or present, can be cryptographically verified. This property is not provided by Git or TUF alone.
We demonstrate that TAF is secure, scalable and performant, analyzing its behavior in various attack scenarios and its performance on large legal repositories, as well as ease of use. As a testament to TAF's security properties and performance, TAF is in production use by 14 jurisdictions in the US, including the City of Baltimore, the State of Maryland and Washington, D.C.
The Malware Ultimatum
-
Peijie Li (TU Delft), Huanhuan Chen (TU Delft), Evangelia Anna Markatou (TU Delft), Kaitai Liang (TU Delft)
Searchable Encryption (SE) has shown a lot of promise towards enabling secure and efficient queries over encrypted data. In order to achieve this efficiency, SE inevitably leaks some information, and a big open question is how dangerous this leakage is. While prior reconstruction attacks have demonstrated effectiveness in one-dimensional range query settings, extending them to high-dimensional datasets remains challenging. Existing methods either demand excessive query information (e.g., an attacker that has observed all possible responses) or produce low-quality reconstructions in sparse databases. In this work, we present REMIN, a new leakage-abuse attack against SE schemes in multi-dimensional settings, exploiting access and search pattern leakage from range queries. REMIN leverages unsupervised representation learning to transform query co-occurrence frequencies into geometric signals, enabling an attacker to infer relative spatial relationships among encrypted records. This approach allows accurate and scalable reconstruction of high-dimensional datasets under minimal leakage. Furthermore, we introduce REMIN-P, an active variant of the attack that incorporates a practical poisoning strategy. By injecting a small number of auxiliary anchor points, REMIN-P significantly improves reconstruction quality, particularly in sparse or boundary regions of the data space. We evaluate our attacks extensively on both synthetic and real-world datasets. Compared to state-of-the-art reconstruction attacks, our reconstruction attack achieves up to $50%$ reduction in mean squared error (MSE), all while maintaining fast and scalable runtime. Our poisoning attack can further reduce MSE by an additional $50%$ on average, depending on the poisoning strategy.
-
Chengfeng Ye (The Hong Kong University of Science and Technology), Anshunkang Zhou (The Hong Kong University of Science and Technology), Charles Zhang (The Hong Kong University of Science and Technology)
Binary diffing, which detects differences between two pieces of binary code, is the fundamental technique in various security analysis tasks.
Existing work shows that a sufficient number of fine-grained alignments as anchor points can significantly improve the overall accuracy of binary diffing. However, existing methods still suffer from numerous limitations that hinder accurate and efficient anchor point identification. Syntax-based techniques are known to be vulnerable to aggressive compiler optimizations, while semantic-based methods are limited by high computation cost or low code coverage.In this paper, we revisit dynamic analysis to seek new insights to address the limitations of existing approaches. Our main insight is that not all dynamic semantics are necessary or equally effective for identifying valid instruction alignment. Therefore, we can prioritize dynamic execution resources to partially reveal the runtime values that can effectively derive instruction alignment. Based on the above insight, we propose Barracuda, a high-confidence instruction alignment technique based on partial instruction semantics extracted from forced execution. We have implemented Barracuda and conducted extensive experiments to evaluate its effectiveness. Extensive experimental results demonstrate that Barracuda can detect 24.0% more instruction alignment as anchor points with a high precision of 92.1%. The anchor points detected by Barracuda can enhance state-of-the-art binary diffing tools, DeepBinDiff and SigmaDiff, with percentage point increases in F1 scores ranging from 12.3% to 42.7% and 2.2% to 4.1%, respectively, across various binary diffing scenarios.
-
Xinzhe Huang (Zhejiang university), Kedong Xiu (Zhejiang university), Tianhang Zheng (Zhejiang university), Churui Zeng (Zhejiang university), Wangze Ni (Zhejiang university), Zhan Qin (Zhejiang university), Kui Ren (Zhejiang university), Chun Chen (Zhejiang university)
Recent research has focused on exploring the vulnerabilities of Large Language Models (LLMs), aiming to elicit harmful and/or sensitive content from LLMs. However, due to the insufficient research on dual-jailbreaking---attacks targeting both LLMs and Guardrails, the effectiveness of existing attacks is limited when attempting to bypass safety-aligned LLMs shielded by guardrails. Therefore, in this paper, we propose DualBreach, a target-driven framework for dual-jailbreaking. DualBreach employs a textit{Target-driven Initialization} (TDI) strategy to dynamically construct initial prompts, combined with a textit{Multi-Target Optimization} (MTO) method that utilizes approximate gradients to jointly adapt the prompts across guardrails and LLMs, which can simultaneously save the number of queries and achieve a high dual-jailbreaking success rate. For black-box guardrails, DualBreach either employs a powerful open-sourced guardrail or imitates the target black-box guardrail by training a proxy model, to incorporate guardrails into the MTO process.
We demonstrate the effectiveness of DualBreach in dual-jailbreaking scenarios through extensive evaluation on several widely-used datasets. Experimental results indicate that DualBreach outperforms state-of-the-art methods with fewer queries, achieving significantly higher success rates across all settings. More specifically, DualBreach achieves an average dual-jailbreaking success rate of 93.67% against GPT-4 with Llama-Guard-3 protection, whereas the best success rate achieved by other methods is 88.33%. Moreover, DualBreach only uses an average of 1.77 queries per successful dual-jailbreak, outperforming other state-of-the-art methods. For defense, we propose an XGBoost-based ensemble defensive mechanism named textsc{EGuard}, which integrates the strengths of multiple guardrails, demonstrating superior performance compared with Llama-Guard-3.
-
Rong Wang (Southeast University), Zhen Ling (Southeast University), Guangchi Liu (Southeast University), Shaofeng Li (Southeast University), Junzhou Luo (Southeast University), Xinwen Fu (University of Massachusetts Lowell)
In response to growing online privacy threats, the Tor network offers essential protection against surveillance by routing traffic through a decentralized, encrypted infrastructure. However, Website Fingerprinting Attacks (WFA) present a formidable challenge to Tor's anonymity. This paper introduces FRUGAL, a traffic obfuscation method that leverages the mutual information (MI) reduction between website traffic and labels as an optimization goal, advancing a novel perspective for Website Fingerprinting Defense (WFD). By strategically injecting dummy packets at positions within website traffic that contribute most to cumulative MI reduction, FRUGAL achieves notable performance compared to state-of-the-art (SOTA) defense mechanisms. It effectively reduces attack success rates (ASR) across diverse attack models while maintaining minimal bandwidth overhead (BWO) and mitigating the impact of adversarial training. Extensive experiments validate the efficacy of FRUGAL across a comprehensive set of scenarios, including closed-world, open-world, and real-world simulation settings. For example, in the closed-world setting, FRUGAL reduces the ASR of the DF model to 2.68% with a 30% BWO, substantially outperforming previous SOTA defenses, such as Palette (11.54% with 87% BWO). When the BWO of FRUGAL is increased to a comparable level of 80%, the ASR further drops below 1%, demonstrating significant resilience by remaining low at 9.42% even after adversarial training, compared to 20.27% for Palette. This work not only introduces a fresh perspective on WFD research but also establishes FRUGAL as a robust and universal defense framework against WFA.
The Hitchhiker's Guide to the Binary
-
Yuhan Meng (Key Laboratory of High-Confidence Software Technologies (MOE), School of Computer Science, Peking University), Shaofei Li (Key Laboratory of High-Confidence Software Technologies (MOE), School of Computer Science, Peking University), Jiaping Gui (School of Computer Science, Shanghai Jiao Tong University), Peng Jiang (Southeast University), Ding Li (Key Laboratory of High-Confidence Software Technologies (MOE), School of Computer Science, Peking University)
High-level natural language knowledge in Cyber Threat Intelligence (CTI) reports, such as the ATT&CK framework, is beneficial to counter Advanced Persistent Threat (APT) attacks. However, how to automatically apply the high-level knowledge in CTI reports in realistic attack detection systems, such as provenance analysis systems, is still an open problem. The challenge stems from the semantic gap between the knowledge and the low-level security logs: while the knowledge in CTI reports is written in natural language, attack detection systems can only process low-level system events like file accesses or network IP manipulations. Manual approaches can be labor-intensive and error-prone.
In this paper, we propose KnowHow, a CTI-knowledge-driven online provenance analysis approach that can automatically apply high-level attack knowledge from CTI reports written in natural language to detect low-level system events. The core of KnowHow is a novel attack knowledge representation, general Indicator of Compromise (gIoC), that represents the subjects, objects, and actions of attacks. By lifting system identifiers, such as file paths, in system events to natural language terms, KnowHow can match system events to gIoCs and further match them to techniques described in natural language. Finally, based on the techniques matched to system events, KnowHow reasons about the temporal logic of attack steps and detects potential APT attacks in system events. Our evaluation shows that KnowHow can accurately detect all 16 APT campaigns in the open-source and industrial datasets, while existing approaches all introduce large numbers of false positives. Meanwhile, our evaluation also shows that KnowHow reduces at most 90% of node-level false positives while having a higher node-level recall and is robust against several unknown attacks and mimicry attacks.
-
From Noise to Signal: Precisely Identify Affected Packages of Known Vulnerabilities in npm Ecosystem
Yingyuan Pu (QI-ANXIN Technology Research Institute), Lingyun Ying (QI-ANXIN Technology Research Institute), Yacong Gu (Tsinghua University; Tsinghua University-QI-ANXIN Group JCNS)
npm is the largest open-source software ecosystem with over 3 million packages. However, its complex dependencies between packages expose it to significant security threats as many packages directly or indirectly depend on other ones with known vulnerabilities.
Timely updating these vulnerable dependencies is a big challenge in software supply chain security, primarily due to the widespread effect of vulnerabilities and the huge cost of fixing them. Recent studies have shown that existing package-level vulnerability-propagation-analysis tools lead to high false positives, while function-level tools are not yet feasible for large-scale analysis in the npm ecosystem.In this paper, we propose a novel framework VulTracer, which can precisely and efficiently perform vulnerability propagation analysis at function level. By constructing a rich semantic graph for each package independently and then stitching them together, VulTracer can locate vulnerability propagation paths and identify truly affected packages precisely. Through comparative evaluations, our framework achieves an F1 score of 0.905 in call graph construction and reduces false positives from npm audit by 94%. We conducted the largest-to-date function-level vulnerability impact measurement on the entire npm ecosystem, covering 34 million package versions. The results demonstrate that 68.28% of potential impacts identified by package-level analysis are merely noise, as the vulnerable code is unreachable. Furthermore, our findings also uncover that true vulnerability propagation (the signal) is shallow, with impact attenuating significantly within just a few dependency hops. VulTracer provides a practical path to mitigate alert fatigue and enables security efforts to focus on genuine, reachable threats.
-
Yuqiao Yang (University of Electronic Science and Technology of China), Yongzhao Zhang (University of Electronic Science and Technology of China), Wenhao Liu (GoGoByte Technology), Jun Li (GoGoByte Technology), Pengtao Shi (GoGoByte Technology), DingYu Zhong (University of Electronic Science and Technology of China), Jie Yang (University of Electronic Science and Technology of China), Ting Chen (University of Electronic Science and Technology of China), Sheng Cao (University of Electronic Science and Technology of China), Yuntao Ren (University of Electronic Science and Technology of China), Yongyue Wu (University of Electronic Science and Technology of China), Xiaosong Zhang (University of Electronic Science and Technology of China)
As modern vehicles evolve into intelligent and connected systems, their growing complexity introduces significant cybersecurity risks. Threat Analysis and Risk Assessment (TARA) has therefore become essential for managing these risks under mandatory regulations. However, existing TARA automation methods rely on static threat libraries, limiting their utility in the detailed, function-level analyses demanded by industry. This paper introduces DefenseWeaver, the first system that automates function-level TARA using component-specific details and large language models (LLMs). DefenseWeaver dynamically generates attack trees and risk evaluations from system configurations described in an extended OpenXSAM++ format, then employs a multi-agent framework to coordinate specialized LLM roles for more robust analysis. To further adapt to evolving threats and diverse standards, DefenseWeaver incorporates Low-Rank Adaptation (LoRA) fine-tuning and Retrieval-Augmented Generation (RAG) with expert-curated TARA reports. We validated DefenseWeaver through deployment in four automotive security projects, where it identified 11 critical attack paths, verified through penetration testing, and subsequently reported and remediated by the relevant automakers and suppliers. Additionally, DefenseWeaver demonstrated cross-domain adaptability, successfully applying to unmanned aerial vehicles (UAVs) and marine navigation systems. In comparison to human experts, DefenseWeaver outperformed manual attack tree generation across six assessment scenarios. Integrated into commercial cybersecurity platforms such as UAES and Xiaomi, DefenseWeaver has generated over 8,200 attack trees. These results highlight its ability to significantly reduce processing time, and its scalability and transformative impact on cybersecurity across industries.
-
Luke Kurlandski (Rochester Institute of Technology), Harel Berger (Ariel University), Yin Pan (Rochester Institute of Technology), Matthew Wright (Rochester Institute of Technology)
Malware poses an increasing threat to critical computing infrastructure, driving demand for more advanced detection and analysis methods. Although raw-binary malware classifiers show promise, they are limited in their capabilities and struggle with the challenges of modeling long sequences. Meanwhile, the rise of large language models (LLMs) in natural language processing showcases the power of massive, self-supervised models trained on heterogeneous datasets, offering flexible representations for numerous downstream tasks. The success behind these models is rooted in the size and quality of their training data, the expressiveness and scalability of their neural architecture, and their ability to learn from unlabeled data in a self-supervised manner.
In this work, we take the first steps toward developing large malware language models (LMLMs), the malware analog to LLMs. We tackle the core aspects of this objective, namely, questions about data, models, pretraining, and finetuning. By pretraining a malware classification model with language modeling objectives, we were able to improve downstream performance on diverse practical malware classification tasks on average by 1.1% and up to 28.6%, indicating that these models could serve to succeed raw-binary malware classifiers.
Crash Bandicoot
-
Meng Wang (CISPA Helmholtz Center for Information Security), Philipp Görz (CISPA Helmholtz Center for Information Security), Joschua Schilling (CISPA Helmholtz Center for Information Security), Keno Hassler (CISPA Helmholtz Center for Information Security), Liwei Guo (University of Electronic Science and Technology), Thorsten Holz (Max Planck Institute for Security and Privacy), Ali Abbasi (CISPA Helmholtz Center for Information Security)
Detecting business logic vulnerabilities is a critical challenge in software security. These flaws come from mistakes in an application’s design or implementation and allow attackers to trigger unintended application behavior. Traditional fuzzing sanitizers for dynamic analysis excel at finding vulnerabilities related to memory safety violations but largely fail to detect business logic vulnerabilities, as these flaws require understanding application-specific semantic context. Recent attempts to infer this context, due to their reliance on heuristics and non-portable language features, are inherently brittle and incomplete. As business logic vulnerabilities constitute a majority (27 of the CWE Top 40) of the most dangerous software weaknesses in practice, this is a worrying blind spot of existing tools.
In this paper, we tackle this challenge with ANOTA, a novel human-in-the-loop sanitizer framework. ANOTA introduces a lightweight, user-friendly annotation system that enables users to directly encode their domain-specific knowledge as lightweight annotations that define an application’s intended behavior. A runtime execution monitor then observes program behavior, comparing it against the policies defined by the annotations, thereby identifying deviations that indicate vulnerabilities. To evaluate the effectiveness of ANOTA, we combine ANOTA with a state-of-the-art fuzzer and compare it against other popular bug finding methods compatible with the same targets. The results show that ANOTA+FUZZER outperforms them in terms of effectiveness. More specifically, ANOTA+FUZZER can successfully reproduce 43 known vulnerabilities, and discovered 22 previously unknown vulnerabilities (17 CVEs assigned) during the evaluation. These results demonstrate that ANOTA provides a practical and effective approach for uncovering complex business logic flaws often missed by traditional security testing techniques.
-
Fangzhou Dong (Arizona State University), Arvind S Raj (Arizona State University), Efrén López-Morales (New Mexico State University), Siyu Liu (Arizona State University), Yan Shoshitaishvili (Arizona State University), Tiffany Bao (Arizona State University), Adam Doupé (Arizona State University), Muslum Ozgur Ozmen (Arizona State University), Ruoyu Wang (Arizona State University)
Programmable Logic Controllers (PLCs) are industrial computers that control devices with real-world physical effects, and safety vulnerabilities in these systems can lead to catastrophic consequences. While prior research has proposed techniques to detect safety issues in PLC state machines, most approaches require access to design specifications or source code—resources often unavailable to analysts or end users.
This paper targets a prevalent class of vulnerabilities, which we name Blind-Trust Vulnerabilities, caused by missing or incomplete safety checks on peripheral inputs. We introduce Ta’veren, a novel static analysis-based framework that identifies such vulnerabilities directly from PLC binaries without relying on firmware rehosting, which remains an open research problem in firmware analysis. Ta’veren recovers the finite state machines of the PLC binaries, enabling repeated safety analyses under various policy specifications. To abstract the state from program states to logic-related states, we leverage our insight that PLCs consistently use specific variables to represent internal states, thus allowing for aggressive state deduplication. This insight enables us to effectively deduplicate states without compromising soundness. We develop a prototype of Ta’veren and evaluate it on real-world PLC binaries. Our experiments show that Ta’veren efficiently recovers meaningful FSMs and uncovers critical safety violations with high effectiveness.
-
Rujia Li (Tsinghua University), Mingfei Zhang (Shandong University), Xueqian Lu (Independent Reseacher), Wenbo Xu (AntChain Platform Division, Ant Group), Ying Yan (Blockchain Platform Division, Ant Group), Sisi Duan (Tsinghua University)
Ethereum, a leading blockchain platform, relies on incentive mechanisms to improve its stability. Recently, several attacks targeting the incentive mechanisms have been proposed. Examples include the so-called reorganization attacks that cause blocks proposed by honest validators to be discarded. In reorganization attacks, honest validators suffer from lower rewards than their fair share. Finding these attacks, however, heavily relies on expert knowledge and may involve substantial manual effort.
We present proto, a framework for finding incentive flaws in Ethereum with little manual effort. proto is inspired by failure injection, a technique commonly used in software testing for finding implementation vulnerabilities. Instead of finding implementation vulnerabilities, we aim to find design flaws. Our main technical contributions involve a carefully designed ``strategy generator" that generates a large pool of attack instances, an automatic workflow that launches attacks and analyzes the results, and a workflow that integrates reinforcement learning to fine-tune the attack parameters and identify the most profitable attacks. We simulate a total of 7,991 attack instances using our framework and find the following results. First, our framework textit{reproduces} five known incentive attacks that were previously found manually. Second, we find three new attacks that can be identified as incentive flaws. Finally and surprisingly, one of our experiments also identified two implementation flaws.
-
Chen Chen (Texas A&M University), Zaiyan Xu (Texas A&M University), Mohamadreza Rostami (Technical University of Darmstadt), David Liu (Texas A & M University), Dileep Kalathil (TAMU), Ahmad-Reza Sadeghi (TU Darmstadt), Jeyavijayan Rajendran (TAMU)
Processor designs rely on iterative modifications and reuse well-established designs. However, this reuse of prior designs also leads to similar vulnerabilities across multiple processors. As processors grow increasingly complex with iterative modifications, efficiently detecting vulnerabilities from modern processors is critical. Inspired by software fuzzing, hardware fuzzing has recently demonstrated its effectiveness in detecting processor vulnerabilities. Yet, to our best knowledge, existing processor fuzzers fuzz each design individually, lacking the capability to understand known vulnerabilities in prior processors to fine-tune fuzzing to identify similar or new variants of vulnerabilities.
To address this gap, we present *ReFuzz*, an adaptive fuzzing framework that leverages contextual bandit to reuse highly effective tests from prior processors to fuzz a processor-under-test (PUT) within a given ISA. By intelligently mutating tests that trigger vulnerabilities in prior processors, ReFuzz detects similar and new variants of vulnerabilities in PUTs. *ReFuzz* uncovered three new security vulnerabilities and two new functional bugs. *ReFuzz* detected one vulnerability by reusing a test that triggers a known vulnerability in a prior processor. One functional bug exists across three processors that share design modules. The second bug has two variants. Additionally, *ReFuzz* reuses highly effective tests to enhance efficiency in coverage, achieving an average $511.23times{}$ coverage speedup and up to $9.33%$ more total coverage, compared to existing fuzzers.
Guardians of the Gradient
-
Xue Tan (Fudan University), Hao Luan (Fudan University), Mingyu Luo (Fudan University), Zhuyang Yu (Fudan University), Jun Dai (Worcester Polytechnic Institute), Xiaoyan Sun (Worcester Polytechnic Institute), Ping Chen (Fudan University)
With the rapid development of Large Language Models (LLMs), their applications have expanded across various aspects of daily life. Open-source LLMs, in particular, have gained popularity due to their accessibility, resulting in widespread downloading and redistribution. The impressive capabilities of LLMs results from training on massive and often undisclosed datasets. This raises the question of whether sensitive content such as copyrighted or personal data is included, which is known as the membership inference problem. Existing methods mainly rely on model outputs and overlook rich internal representations. Limited access to internal data leads to suboptimal results, revealing a research gap for membership inference in open-source white-box LLMs.
In this paper, we address the challenge of detecting the training data of open-source LLMs. To support this investigation, we introduce three dynamic benchmarks: textit{WikiTection}, textit{NewsTection}, and textit{ArXivTection}. We then propose a white-box approach for training data detection by analyzing neural activations of LLMs. Our key insight is that the neuron activations across all layers of LLM reflect the internal representation of knowledge related to the input data within the LLM, which can effectively distinguish between training data and non-training data of LLM. Extensive experiments on these benchmarks demonstrate the strong effectiveness of our approach. For instance, on the textit{WikiTection} benchmark, our method achieves an AUC of around 0.98 across five LLMs: textit{GPT2-xl}, textit{LLaMA2-7B}, textit{LLaMA3-8B}, textit{Mistral-7B}, and textit{LLaMA2-13B}. Additionally, we conducted in-depth analysis on factors such as model size, input length, and text paraphrasing, further validating the robustness and adaptability of our method.
-
Yuntao Du (Purdue University), Jiacheng Li (Purdue University), Yuetian Chen (Purdue University), Kaiyuan Zhang (Purdue University), Zhizhen Yuan (Purdue University), Hanshen Xiao (Purdue University), Bruno Ribeiro (Purdue University), Ninghui Li (Purdue University)
A Membership Inference Attack (MIA) assesses how much a trained machine learning model reveals about its training data by determining whether specific query instances were included in the dataset. We classify existing MIAs into adaptive or non-adaptive, depending on whether the adversary is allowed to train shadow models on membership queries. In the adaptive setting, where the adversary can train shadow models after accessing query instances, we highlight the importance of exploiting membership dependencies between instances and propose an attack-agnostic framework called Cascading Membership Inference Attack (CMIA), which incorporates membership dependencies via conditional shadow training to boost membership inference performance.
In the non-adaptive setting, where the adversary is restricted to training shadow models before obtaining membership queries, we introduce Proxy Membership Inference Attack (PMIA). PMIA employs a proxy selection strategy that identifies samples with similar behaviors to the query instance and uses their behaviors in shadow models to perform a membership posterior odds test for membership inference. We provide theoretical analyses for both attacks, and extensive experimental results demonstrate that CMIA and PMIA substantially outperform existing MIAs in both settings, particularly in the low false-positive regime, which is crucial for evaluating privacy risks.
-
Ruixuan Liu (Emory University), Toan Tran (Emory University), Tianhao Wang (University of Virginia), Hongsheng Hu (Shanghai Jiao Tong University), Shuo Wang (Shanghai Jiao Tong University), Li Xiong (Emory University)
As large language models increasingly memorize web-scraped training content, they risk exposing copyrighted or private information. Existing protections require compliance from crawlers or model developers, fundamentally limiting their effectiveness. We propose ExpShield, a proactive self-guard that mitigates memorization while maintaining readability via invisible perturbations, and we formulate it as a constrained optimization problem. Due to the lack of an individual-level risk metric for natural text, we first propose instance exploitation, a metric that measures how much training on a specific text increases the chance of guessing that text from a set of candidates—with zero indicating perfect defense. Directly solving the problem is infeasible for defenders without sufficient knowledge, thus we develop two effective proxy solutions: single-level optimization and synthetic perturbation. To enhance the defense, we reveal and verify the memorization trigger hypothesis, which can help to identify key tokens for memorization. Leveraging this insight, we design targeted perturbations that (i) neutralize inherent trigger tokens to reduce memorization and (ii) introduce artificial trigger tokens to misdirect model memorization. Experiments validate our defense across attacks, model scales, and tasks in language and vision-to-language modeling. Even with privacy backdoor, the Membership Inference Attack (MIA) AUC drops from 0.95 to 0.55 under the defense, and the instance exploitation approaches zero. This suggests that compared to the ideal no-misuse scenario, the risk of exposing a text instance remains nearly unchanged despite its inclusion in the training data.
-
Reachal Wang (Duke University), Yuqi Jia (Duke University), Neil Gong (Duke University)
Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attacker-chosen task instead of the intended task. In many applications and agents, the input data originates from multiple sources, with each source contributing a segment of the overall input. In these multi-source scenarios, an attacker may control only a subset of the sources and contaminate the corresponding segments, but typically does not know the order in which the segments are arranged within the input. Existing prompt injection attacks either assume that the entire input data comes from a single source under the attacker's control or ignore the uncertainty in the ordering of segments from different sources. As a result, their success is limited in domains involving multi-source data.
In this work, we propose emph{ObliInjection}, the first prompt injection attack targeting LLM applications and agents with multi-source input data. ObliInjection introduces two key technical innovations: the emph{order-oblivious loss}, which quantifies the likelihood that the LLM will complete the attacker-chosen task regardless of how the clean and contaminated segments are ordered; and the emph{orderGCG algorithm}, which is tailored to minimize the order-oblivious loss and optimize the contaminated segments. Comprehensive experiments across three datasets spanning diverse application domains and twelve LLMs demonstrate that ObliInjection is highly effective, even when only one out of 6-100 segments in the input data is contaminated. Our code and data are available at: https://github.com/ReachalWang/ObliInjection.
Packet to the Future
-
Marc Wyss (ETH Zurich), Yih-Chun Hu (University of Illinois at Urbana-Champaign), Vincent Lenders (University of Luxembourg), Roland Meier (armasuisse), Adrian Perrig (ETH Zurich)
Ensuring fair bandwidth allocations on the public Internet is challenging. Congestion control algorithms (CCAs) often fail in achieving fairness, especially when different CCAs operate simultaneously. This challenge becomes even more pronounced during volumetric distributed denial-of-service (DDoS) attacks, where legitimate traffic can be starved entirely. One approach to address this challenge is to enforce fairness by allocating bandwidth directly at routers. However, existing solutions generally fall into two categories: those that are easy to deploy but fail to provide secure in-network bandwidth isolation, and those that offer strong isolation guarantees but rely on complex assumptions that hinder real-world deployment.
To bridge the gap between these two categories, we introduce a new fairness model based on the notion of a per-stream Fractional Fair Share (FFS). At each on-path node, a stream’s FFS, represented as packet labels and updated along the forwarding path, conveys its current fair share of egress bandwidth. The combination of a packet-carried FFS and probabilistic forwarding enables effective and scalable isolation of streams with minimal overhead. FFS is the first system to combine low implementation and deployment overhead with effective bandwidth isolation, while remaining robust against source address spoofing and volumetric DDoS attacks, and delivering high performance, scalability, as well as minimal latency and jitter.
We show that FFS effectively isolates bandwidth across 15 different CCAs while keeping latency and jitter minimal. Our high-speed implementation sustains a 160 Gbps line rate on commodity hardware. Evaluated on realistic Internet topologies, FFS outperforms several of the most recent and secure bandwidth isolation systems in both median and total bandwidth allocation. In our security analysis, we prove that FFS guarantees a non-zero lower bound on bandwidth allocation for every traffic stream, ensuring that volumetric DDoS attacks, even when combined with source address spoofing, cannot prevent legitimate communication. Finally, we present an extension of FFS that provides accurate and secure rate feedback to the sender, allowing rapid rate adaptation with minimal packet loss.
-
Radu Anghel (TU Delft), Carlos Gañán (ICANN), Qasim Lone (RIPE NCC), Matthew Luckie (CAIDA), Yury Zhauniarovich (TU Delft)
Spoofed traffic remains a major network hygiene concern, as it enables Distributed Denial-of-Service (DDoS) attacks by obscuring attack origins and hindering forensic analysis. A key indicator of poor hygiene is the presence of emph{Bogon} traffic---packets carrying invalid or non-routable source addresses---in the public Internet, arising from misconfigurations or insufficient filtering. Despite long-standing Source Address Validation (SAV) recommendations such as BCP~38 and BCP~84, Bogon filtering remains inconsistently deployed. In this work, we analyze eight years (2017--2024) of traceroute measurements from the CAIDA Ark platform, enriched with historical BGP data from RIPE RIS and RouteViews, to quantify the prevalence and characteristics of Bogon addresses in the data plane. We observe widespread non-compliance with best practices: between 82.69% and 97.83% of Ark vantage points encounter traceroute paths containing Bogon IPs, predominantly RFC1918 addresses. Overall, 21.11% of traceroutes include RFC1918 addresses, with smaller fractions involving RFC6598 (1.68%) and RFC3927 (0.08%). We identify over 15,500 Autonomous Systems (ASes) that transit Bogon traffic, although only 11.88% do so in more than half of the measurements. Cross-referencing with the Spoofer project and MANRS reveals a significant gap between control- and data-plane assurances: 52.71% of ASes forwarding Bogon-sourced packets are classified as non-spoofable, indicating incomplete or ineffective SAV deployment.
-
Jan Drescher (TU Braunschweig), David Klein (TU Braunschweig), Martin Johns (TU Braunschweig)
Site Isolation is one of the core security mechanisms of a modern browser. By confining aspects such as the JavaScript Just-in-Time compiler or the HTML rendering to a sandboxed process, web browsers significantly reduce the impact of memory corruption errors. In addition, the mechanism protects against microarchitectural attacks such as Spectre. When using Site Isolation, the browser confines all processing related to a site to its own sandboxed process. All communication with the privileged browser process is done via exchanging IPC messages. This, however, requires the browser process to keep track of which renderer process belongs to which site, as otherwise, an attacker can abuse a memory corruption issue in the renderer to attack other sites by sending malicious IPC messages. This, in turn, would allow attackers to leak sensitive data, such as cookies, or even achieve Universal Cross-Site Scripting.
This work presents the first automatic approach to detect such vulnerabilities, called Site Isolation bypasses, in Firefox and Chrome. For this, we propose a novel oracle to detect the semantic bugs that cause Site Isolation bypass vulnerabilities by flagging cross-site data leaks on the process level. In addition, we design a fuzzer that simulates a compromised renderer process, trying to use the browser process as a confused deputy by hooking into the IPC communication. Our work uncovered four security vulnerabilities in Chrome and Firefox: three less severe bugs leak data cross-site while the fourth bug facilitates complete control over the victim site.
-
Haya Schulmann (Goethe-Universität Frankfurt), Niklas Vogel (Goethe-Universität Frankfurt)
Resource Public Key Infrastructure (RPKI) is a critical security mechanism for BGP, but the complexity of its architecture is a growing concern as its adoption scales. Current RPKI design heavily reuses legacy PKI components, such as X.509 EE-certificates, ASN.1 encoding, and XML-based repository protocols, which introduce excessive cryptographic validation, redundant metadata, and inefficiencies in both storage and processing. We show that these design choices, although based on established standards, create significant performance bottlenecks, increase the vulnerability surface, and hinder scalability for wide-scale Internet deployment.
In this paper, we perform the first systematic analysis of the root causes of complexity in RPKI's design and experimentally quantify their real-world impact. We show that over 70% of validation time in RPKI relying parties is spent on certificate parsing and signature verification, much of it unnecessary. Building on this insight, we introduce the improved RPKI (iRPKI), a backwards-compatible redesign that preserves all security guarantees while substantially reducing protocol overhead. iRPKI eliminates EE-certificates and ROA signatures, merges revocation and integrity objects, replaces verbose encodings with Protobuf, and restructures repository metadata for more efficient access.
We experimentally demonstrate that our implementation of iRPKI in the Routinator validator achieves a 20x speed-up of processing time, 18x improvement of bandwidth requirements and 8x reduction in cache memory footprint, while also eliminating classes of vulnerabilities that have led to at least 10 vulnerabilities in RPKI software. iRPKI significantly increases the feasibility of deploying RPKI at scale in the Internet, and especially in constrained environments. Our design may be deployed incrementally without impacting existing operations.We make our design, object templates, publication point software and RP implementation open-source to facilitate integration of iRPKI into current RPKI deployments, and to enable reproduction of our study. We further provide recommendations how to derive new RPKI specification from our proposed improvements to facilitate standardization.
Thursday, 26 February
Photos of award winners and their certificates
Shake the Silicon
-
Johannes Lenzen (Technical University of Darmstadt), Mohamadreza Rostami (Technical University of Darmstadt), Lichao Wu (TU Darmstadt), Ahmad-Reza Sadeghi (Technical University of Darmstadt)
Modern Central Processing Units (CPUs) are black boxes, proprietary, and increasingly characterized by sophisticated microarchitectural flaws that evade traditional analysis. While some of these critical vulnerabilities have been uncovered through cumbersome manual effort, building an automated and systematic vulnerability detection framework for real-world post-silicon processors remains a challenge.
In this paper, we present Fuzzilicon, the first post-silicon fuzzing framework for real-world x86 CPU that brings deep introspection into the microcode and microarchitectural layers. Fuzzilicon automates the discovery of vulnerabilities that were previously only detectable through extensive manual reverse engineering, and bridges the visibility gap by introducing microcode-level instrumentation. At the core of Fuzzilicon is a novel technique for extracting feedback directly from the processor's microarchitecture, enabled by reverse-engineering textit{Intel}'s proprietary microcode update interface. We develop a minimally intrusive instrumentation method and integrate it with a hypervisor-based fuzzing harness to enable precise, feedback-guided input generation, without access to Register Transfer Level (RTL) or vendor support.
Applied to textit{Intel}'s textit{Goldmont} microarchitecture, Fuzzilicon introduces 5 significant findings, including two previously unknown microcode‑level speculative‑execution vulnerabilities. Besides, the Fuzzilicon framework automatically rediscover the $mu$Spectre class of vulnerabilities, which were detected manually in the previous work. Fuzzilicon reduces coverage collection overhead by up to 31$times$ compared to baseline techniques and achieves 16.27% unique microcode coverage of hookable locations, the first empirical baseline of its kind. As a practical, coverage-guided, and scalable approach to post-silicon fuzzing, Fuzzilicon establishes a new foundation to automate the discovery of complex CPU vulnerabilities.
-
Lichao Wu (Technical University of Darmstadt), Mohamadreza Rostami (Technical University of Darmstadt), Huimin Li (Technical University of Darmstadt), Nikhilesh Singh (Technical University of Darmstadt), Ahmad-Reza Sadeghi (Technical University of Darmstadt)
Modern hardware systems, driven by demands for high performance and application-specific functionality, have grown increasingly complex, introducing large surfaces for bugs and security-critical vulnerabilities. Fuzzing has emerged as a scalable solution for discovering such flaws. Yet, existing hardware fuzzers suffer from limited semantic awareness, inefficient test refinement, and high computational overhead due to reliance on slow device simulation.
In this paper, we present GoldenFuzz, a novel two-stage hardware fuzzing framework that partially decouples test case refinement from coverage and vulnerability exploration. GoldenFuzz leverages a fast, ISA-compliant Golden Reference Model (GRM) as a ``digital twin'' of the Device Under Test (DUT). It fuzzes the GRM first, enabling rapid, low-cost test case refinement, accelerating deep architectural exploration and vulnerability discovery on DUT. During the fuzzing pipeline, GoldenFuzz iteratively constructs test cases by concatenating carefully chosen instruction blocks that balance the subtle inter- and intra-instructions quality. A feedback-driven mechanism leveraging insights from both high- and low-coverage samples further enhances GoldenFuzz's capability in hardware state exploration. Our evaluation of three RISC-V processors, RocketChip, BOOM, and CVA6, demonstrates that GoldenFuzz significantly outperforms existing fuzzers in achieving the highest coverage with minimal test case length and computational overhead. GoldenFuzz uncovers all known vulnerabilities and discovers five new ones, four of which are classified as highly severe with CVSS v3 severity scores exceeding seven out of ten. It also identifies two previously unknown vulnerabilities in the commercial BA51-H core extension.
-
Yuncheng Wang (Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China), Yaowen Zheng (Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China), Puzhuo Liu (Ant Group; Tsinghua University), Dongliang Fang (Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China), Jiaxing Cheng (Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China), Dingyi Shi (Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China), Limin Sun (Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China)
Robotic vehicles (RVs) play an increasingly vital role in modern society, with widespread applications in both commercial and military contexts. RV control software is the core of RV systems, which maintains proper operation by continuously computing the vehicle's internal state, sensor readings, and external inputs to adjust the system's behavior accordingly. However, the vast combination space of configurable parameters, command inputs, and environment-sensed data in RV software introduces significant security risks to the system. Existing fuzzing techniques face substantial challenges in effectively exploring this vast input space while uncovering deep bugs.
To address these challenges, we propose ADGFuzz, a novel fuzzing framework specifically designed to detect assignment statement bugs in RV control software. ADGFuzz statically constructs an Assignment Dependency Graph (ADG) to capture inter-variable dependencies within the program. These dependencies are then propagated to the RV input space by leveraging naming similarities, resulting in a targeted set of inputs referred to as the matched input set (MIS). Building upon this, ADGFuzz performs entropy-aware fuzzing over the MISs, thereby enhancing the overall efficiency of bug discovery. In our evaluation, ADGFuzz uncovered 87 unique bugs across three RV types, 78 of which were previously unknown. All found bugs were responsibly disclosed to the developers, and 16 have been confirmed for fixing. -
Eunkyu Lee (KAIST), Junyoung Park (KAIST), Insu Yun (KAIST)
Real-Time Operating System (RTOS) is widely used in embedded systems with its various subsystems such as Bluetooth and Wi-Fi. As its functionalities grow, its attack surface also expands, exposing it to more security threats. To address this, dynamic testing techniques like fuzzing have been widely applied to embedded systems. However, for RTOS, these techniques struggle to effectively test deeply located functions within the kernel due to their complexity.
In this paper, we present RTCon, a context-adaptive function-level fuzzer for RTOS kernels. RTCon performs function-level fuzzing on any target functions within the RTOS kernel by adaptively generating function contexts during fuzzing. Additionally, RTCon employs Multi-layer Classification to classify crashes by confidence levels, helping analysts focus on high-confidence crashes. We implemented the prototype of RTCon and evaluated it on four popular RTOS kernels: Zephyr, RIOT, FreeRTOS, and ThreadX. As a result, RTCon discovered 27 bugs, including 25 new bugs. We reported all of them to maintainers and received 14 CVEs. RTCon also demonstrated its effectiveness in crash classification, achieving a 92.7% precision for high-confidence crashes, compared to a 5.8% precision for low-confidence crashes.
The Rise of the Defenders
-
Yiran Zhu (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Tong Tang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Jie Wan (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Ziqi Yang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University; Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security), Zhenguang Liu (The State Key Laboratory of Blockchain and Data Security, Zhejiang University; Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security), Lorenzo Cavallaro (University College London)
Binary diffing aims to align portions of control flow graphs corresponding to the same source code snippets between two binaries for software security analyses, such as vulnerability and plagiarism detection tasks. Previous works have limited effectiveness and inflexible support for cross-compilation environment scenarios. The main reason is that they perform matching based on the similarity comparison of basic blocks. In our work, we propose a novel diffing approach BINALIGNER to alleviate the above limitations at the binary level. To reduce the likelihood of false and missed matches corresponding to the same source code snippets, we present conditional relaxation strategies to find candidate subgraph pairs. To support a more flexible binary diffing in cross-compilation environment scenarios, we use instruction-independent basic block features for subgraph embedding generation. We implement BINALIGNER and conduct experiments across four cross-compilation environment scenarios (i.e., cross-version, cross-compiler, cross-optimization level, and cross-architecture) to evaluate its effectiveness and support ability for different scenarios. Experimental results show that BINALIGNER significantly outperforms the state-of-the-art methods in most scenarios. Especially in the cross-architecture scenario and multiple combinations of cross-compilation environment scenarios, BINALIGNER exhibits F1-scores that are on average 65% higher than the baselines. Two case studies using real-world vulnerabilities and patches further demonstrate the utility of BINALIGNER.
-
Yue Huang (Tsinghua University), Xin Wang (Tsinghua University), Haibin Zhang (Yangtze Delta Region Institute of Tsinghua University, Zhejiang), Sisi Duan (Tsinghua University)
Conventional Byzantine fault-tolerant consensus protocols focus on the workflow within a group of nodes. In recent years, many applications of consensus involve communication across groups. Examples include communication between replicated state machine across different infrastructures, sharding-based protocols where nodes from different shards communicate with each other, and cross-chain bridges. Unfortunately, little effort has been made to model the properties for communication across groups.
In this work, we propose a new primitive called cross-consensus reliable broadcast (XRBC). The XRBC primitive models the security properties of communication between two groups, where at least one group executes a consensus protocol. We provide three constructions of XRBC under different assumptions and present three different applications for our XRBC protocols: a cross-shard coordination protocol via a case study of Reticulum (NDSS 2024), a protocol for cross-shard transactions via a case study of Chainspace (NDSS 2018), and a solution for cross-chain bridge. Our evaluation results show that our protocols are highly efficient and benefit different applications. For example, in our case study on Reticulum, our approach achieves 61.16% lower latency than the vanilla approach.
-
Huaijin Wang (The Ohio State University), Zhiqiang Lin (The Ohio State University)
Binary Code Similarity Analysis (BCSA) plays a vital role in many security tasks, including malware analysis, vulnerability detection, and software supply chain security. While numerous BCSA techniques have been proposed over the past decade, few leverage the semantics of register and memory textit{values} for comparison, despite promising initial results. Existing value-based approaches often focus narrowly on values that remain invariant across compilation settings, thereby overlooking a broader spectrum of semantically rich information. In this paper, we identify three core challenges limiting the effectiveness of value-based BCSA: unscalable value extraction, lack of noise filtering, and inefficient value comparison. These shortcomings hinder both semantic coverage and scalability. To unlock the full potential of value-based BCSA, we propose vSim, a novel framework that systematically captures values from all register and memory operations, filters out semantically irrelevant values (e.g., global addresses), and normalizes and propagates the remaining values to enable robust and scalable similarity analysis. Extensive evaluation shows that vSim consistently outperforms state-of-the-art BCSA systems in accuracy, robustness, and scalability. It generalizes well across architectures and toolchains, producing reliable results on diverse datasets.
-
Omar Abusabha (Sungkyunkwan university), Jiyong Uhm (Sungkyunkwan University), Tamer Abuhmed (Sungkyunkwan university), Hyungjoon Koo (Sungkyunkwan University)
A function inlining optimization is a widely used transformation in modern compilers, which replaces a call site with the callee’s body in need. While this transformation improves performance, it significantly impacts static features such as machine instructions and control flow graphs, which are crucial to binary analysis. Yet, despite its broad impact, the security impact of function inlining remains underexplored to date. In this paper, we present the first comprehensive study of function inlining through the lens of machine learning-based binary analysis. To this end, we dissect the inlining decision pipeline within the LLVM’s cost model and explore the combinations of the compiler options that aggressively promote the function inlining ratio beyond standard optimization levels, which we term extreme inlining. We focus on five ML-assisted binary analysis tasks for security, using 20 unique models to systematically evaluate their robustness under extreme inlining scenarios. Our extensive experiments reveal several significant findings: i) function inlining, though a benign transformation in intent, can (in)directly affect ML model behaviors, being potentially exploited by evading discriminative or generative ML models; ii) ML models relying on static features can be highly sensitive to inlining; iii) subtle compiler settings can be leveraged to deliberately craft evasive binary variants; and iv) inlining ratios vary substantially across applications and build configurations, undermining assumptions of consistency in training and evaluation of ML models.
Silence of the LANs
-
Vik Vanderlinden (DistriNet, KU Leuven), Tom Van Goethem (DistriNet, KU Leuven), Mathy Vanhoef (DistriNet, KU Leuven)
One of the most well-known side-channel attacks is to infer secret information from the time it takes to perform a certain operation. Many systems have been shown to be vulnerable to such attacks, ranging from cryptographic algorithms, web applications, and even micro-architectural implementations. Exploiting these side-channel leaks over a networked connection is known to be challenging due to variations in the round-trip time, i.e., network jitter. Timing attacks have become especially challenging as processors become faster, resulting in smaller timing differences, systems become more complex, making it more difficult to collect consistent measurements, and networks become more congested, amplifying the network jitter.
In this work we introduce novel remote timing attack methods that are completely unaffected by the jitter on the network path, making them several times more efficient than timing attacks based on the round-trip time, and allow for smaller timing differences to be detected. More specifically, the execution time is inferred from the TCP timestamp values that are generated by the server upon acknowledging the request and sending the response. Furthermore, we show how sequential processing of incoming requests can be leveraged to inflate the time of the secret-dependent operation, resulting in a more accurate attack. Finally, through extensive measurements and a real-world case study we demonstrate that the techniques we introduce in this paper have various advantageous properties compared to other timing attack methods: few(er) prerequisites are required, any TCP-based protocol is subject to these attacks, and the attacks can be executed in a distributed manner.
-
Hannes Weissteiner (Graz University of Technology), Roland Czerny (Graz University of Technology), Simone Franza (Graz University of Technology), Stefan Gast (Graz University of Technology), Johanna Ullrich (University of Vienna), Daniel Gruss (Graz University of Technology)
The Domain Name System (DNS) is a core component of the Internet. Clients query DNS servers to translate domain names to IP addresses. Local DNS caches alleviate the time it takes to query a DNS server, thereby reducing delays to connection attempts. Prior work showed that DNS caches can be exploited via timing attacks to test whether a user has visited a specific website recently but lacked eviction capabilities, i.e., could not monitor when precisely a user accessed a website, others focused on DNS caches in routers. All prior attacks required some form of code execution (e.g., native code, Java, or JavaScript) on the victim’s system, which is also not always possible.
We introduce DMT, a novel Evict+Reload attack to continuously monitor a victim’s Internet accesses through the local, system-wide DNS cache. The foundation of DMT is reliable DNS cache eviction: We present 4 DNS cache eviction techniques to evict the local DNS cache in unprivileged and sandboxed native attacks, virtualized cross-VM attacks, as well as browser-based attacks, i.e., a website with JavaScript and a scriptless attack exploiting the serial loading of fonts integrated in websites. Our attack works both in default settings and when using DNS-over-TLS, DNSSEC, or non-default DNS forwarders for security. We observe eviction times of 77.267 ms on average across all contexts, using our fastest eviction primitive and reload and measurement times of 685.86 ms on average in the best case (cross-VM attack) for 100 domains and 14.710 s on average in the worst case (JavaScript-based attack). Hence, the blind spot of our attack for a granularity of five minutes is smaller than 0.26 % in the best case, and 4.92 % in the worst case, resulting in a reliable attack. In an end-to-end cross-VM attack, we can detect website visits from a list of 103 websites (in an open-world scenario) reliably with an F1 score of 92.48 % within less than one second. In our JavaScript-based attack, we achieve F1 scores of 82.86 % and 78.89 % for detecting accesses to 10 websites, with and without DNSSEC, respectively. We argue that DMT leaks information valuable for extortion and scam campaigns, or to serve exploits tailored to the victim’s EDR solution.
-
Robert Beverly (San Diego State University), Erik Rye (Johns Hopkins University)
Internet services and applications depend critically on the availability and accuracy of network time. The Network Time Protocol is one of the oldest core network protocols and remains the de facto mechanism for clock synchronization across the Internet today. While multiple NTP infrastructures exist, one, the ``NTP Pool,'' presents an attractive attack target for two basic reasons, it is: 1) administratively distributed and based on volunteer servers; and 2) heavily utilized, including by IoT and infrastructure devices worldwide. We gather the first direct, non-inferential, and comprehensive data on the NTP Pool, including: longitudinal server and account membership, server configurations, time quality, aliases, and global query traffic load.
We gather complete and granular data over a nine month period to discover over 15k servers (both active and inactive) and shed new light into the NTP Pool’s use, dynamics, and robustness. By analyzing address aliases, accounts, and network connectivity, we find that only 19.7% of the pool’s active servers are fully independent. Finally, we show that an adversary informed with our data can better and more precisely mount “monopoly attacks” to capture the preponderance of NTP pool traffic in 90% of all countries with only 10 or fewer malicious NTP servers. Our results suggest multiple avenues by which the robustness of the pool can be improved.
-
Wei Shao (University of California, Davis), Najmeh Nazari (University of California, Davis), Behnam Omidi (George Mason University), Setareh Rafatirad (University of California, Davis), Khaled N. Khasawneh (George Mason University), Houman Homayoun (University of California Davis), Chongzhou Fang (Rochester Institute of Technology)
Serverless computing has revolutionized cloud computing by offering users an efficient, cost-effective way to develop and deploy applications without managing infrastructure details. However, serverless cloud users remain vulnerable to various types of attacks, including micro-architectural side-channel attacks. These attacks typically rely on the physical co-location of victim and attacker instances, and attackers need to exploit cloud schedulers to achieve co-location with victims. Therefore, it is crucial to study vulnerabilities in serverless cloud schedulers and assess the security of different serverless scheduling algorithms. This study addresses the gap in understanding and constructing co-location attacks in serverless clouds. We present a comprehensive methodology to uncover exploitable features in serverless scheduling algorithms and to devise strategies for constructing co-location attacks via normal user interfaces. In our experiments, we successfully reveal exploitable vulnerabilities and achieve instance co-location on prevalent open-source infrastructures and Microsoft Azure Functions. We also present a mitigation strategy, the Double-Dip scheduler, to defend against co-location attacks in serverless clouds. Our work highlights critical areas for security enhancements in current cloud schedulers, offering insights to fortify serverless computing environments against potential co-location attacks.
The XSS Files
-
Avinash Awasth (Malaviya National Institute of Technology Jaipur), Pritam Vediya (Malaviya National Institute of Technology Jaipur), Hemant Miranka (LNMIIT Jaipur), Ramesh Babu Battula (Malaviya National Institute of Technology Jaipur), Manoj Sigh Gaur (IIT Jammu)
The rapid augmentation of Internet of Things (IoT) devices that are resource-constrained in nature has significantly expanded the attack surface, exposed critical vulnerabilities in the network. As a result, traditional Intrusion Detection Systems (IDS), which rely on static, signature-based approaches, have become increasingly obsolete. Modern adversaries now employ sophisticated, automated, and often novel (zero-day) attacks that can easily bypass such conventional defenses. Moreover, the existing IDS models with machine learning often fail in real-world scenarios to handle challenges like concept drift and an inability to generalize to unseen threats. To address these gaps, we introduce PANDORA (Probabilistic Adversarial Network Defense Over Resource-constrained Architectures), a novel, end-to-end framework for detecting zero-day attacks on edge devices. PANDORA makes three key contributions: 1) It learns uncertainty-aware probabilistic embeddings to create robust representations of network traffic; 2) It introduces a novel Probabilistic Manifold Structuring and Distance (PMSD) Loss function that enables effective zero-shot generalization; and 3) It utilizes an efficient Mamba-Mixture of Experts (MoE) architecture for on-device deployment. To validate our approach, we also introduce the TTDFIOTIDS2025 dataset, a new, high-fidelity benchmark featuring complex, programmatically generated attacks. Our extensive evaluations demonstrate that PANDORA significantly outperforms state-of-the-art models, achieving an F1-score of 0.971 with just 10-shot adaptation on CICIDS2017. Critically, it achieves up to 99% accuracy in zero-shot detection under domain shift and, when deployed on a Raspberry Pi, maintains a low memory footprint of ˜24 MB and a throughput of up to 4.26 flows/sec, proving its practical viability for real-time edge security.
-
Francesco Da Dalt (ETH Zürich), Adrian Perrig (ETH Zurich)
Heavy–hitter detection underpins line-rate DDoS mitigation and rate-limiting, yet its resilience against adaptive adversaries is largely unexplored. We build an end-to-end evaluation framework that embeds heavy-hitter detection logic in a switch-level simulator, and auto-tunes its parameters using reinforcement learning to rate-limit elephant flows in the network. We subsequently confront the protection system with an adaptive adversary that learns to maximize throughput while evading detection and show that it manages to breach the configured bandwidth cap by up to 299%, exposing systematic blind spots. To harden the monitoring system we apply a form of joint adversarial training: detector and adversary co-evolve and reach an attack-defense Nash equilibrium in which the attacker’s ability to exploit network bandwidth has been reduced by a factor 2.2×. Lastly, we show that it is possible to use machine learning to create smart packet-synthesizers which are able to perform bandwidth exploits on 8 out of 9 tested systems, without any prior knowledge on the targeted detection system. We refer to this as a zero-shot attack as it does not require knowledge about the targeted heavy-hitter detection system to perform its function. Our open-source framework helps quantify underilluminated attack surfaces and provides a constructive approach towards adversarially robust data-plane flow monitoring.
-
Sayak Saha Roy (The University of Texas at Arlington), Shirin Nilizadeh (The University of Texas at Arlington)
We present PhishLang, the first fully client-side anti-phishing framework implemented as a Chromium-based browser extension. PhishLang enables real-time, on-device detection of phishing websites by utilizing a lightweight language model (MobileBERT). Unlike traditional heuristic or static feature-based models that struggle with evasive threats, and deep learning approaches that are too resource-intensive for client-side use, PhishLang analyzes the contextual structure of a page’s source code, achieving detection performance on par with several state-of-the-art models while consuming up to 7 times less memory than comparable architectures. Over a 3.5-month period, we deployed the framework in real-time, successfully identifying approximately 26k phishing URLs, many of which were undetected by popular antiphishing blocklists, thus demonstrating PhishLang's potential to aid current detection measures. On the other hand, the browser extension outperformed several anti-phishing tools, detecting over 91% of the threats during zero-day. PhishLang also showed strong adversarial robustness, resisting 16 categories of realistic problem space evasions through a combination of parser-level defenses and adversarial retraining. To aid both end-users and the research community, we have open-sourced both the PhishLang framework and the browser extension.
-
Shuo Yang (The University of Hong Kong), Xinran Zheng (University College London), Jinze Li (The University of Hong Kong), Jinfeng Xu (The University of Hong Kong), Edith C. H. Ngai (The University of Hong Kong)
Label noise presents a significant challenge in network intrusion detection, leading to erroneous classifications and decreased detection accuracy. Existing methods for handling noisy labels often lack deep insight into network traffic and blindly reconstruct the label distribution to filter samples with noisy labels, resulting in sub-optimal performance. In this paper, we reveal the impact of noisy labels on intrusion detection models from the perspective of causal associations, attributing performance degradation to local consistency of features across categories in network traffic. Motivated by this, we propose CoLD, a textbf{Co}llaborative textbf{L}abel textbf{D}enoising framework for network intrusion detection. CoLD partitions the original feature set into multiple subsets and employs Local Joint Learning to disrupt local consistency, compelling the encoder to learn fine-grained and robust representations. It further applies Causal Collaborative Denoising to detect and filter noisy labels by analyzing causal divergences between multiple representations and their potentially true label, yielding a purified dataset for training a noise-resilient classifier. Experiments on several benchmark datasets demonstrate that CoLD effectively improves classification performance and robustness to label noise, highlighting its potential for enhancing network intrusion detection systems in noisy environments.
Seeing Isn’t Believing
-
Pascal Zimmer (Ruhr University Bochum), Simon Lachnit (Ruhr University Bochum), Alexander Jan Zielinski (Ruhr University Bochum), Ghassan Karame (Ruhr University Bochum)
A number of attacks rely on infrared light sources or heat-absorbing material to imperceptibly fool systems into misinterpreting visual input in various image recognition applications. However, almost all existing approaches can only mount untargeted attacks and require heavy optimizations due to the use-case-specific constraints, such as location and shape.
In this paper, we propose a novel, stealthy, and cost-effective attack to generate both emph{targeted} and emph{untargeted} adversarial infrared perturbations. By projecting perturbations from a transparent film onto the target object with an off-the-shelf infrared flashlight, our approach is the first to reliably mount laser-free emph{targeted} attacks in the infrared domain. Extensive experiments on traffic signs in the digital and physical domains show that our approach is robust and yields higher attack success rates in various attack scenarios across bright lighting conditions, distances, and angles compared to prior work. Equally important, our attack is highly cost-effective, requiring less than $50 and a few tens of seconds for deployment. Finally, we propose a novel segmentation-based detection that thwarts our attack with an F1-score of up to $99%$.
-
Shaoyuan Xie (University of California, Irvine), Mohamad Habib Fakih (University of California, Irvine), Junchi Lu (University of California, Irvine), Fayzah Alshammari (University of California, Irvine), Ningfei Wang (University of California, Irvine), Takami Sato (University of California, Irvine), Halima Bouzidi (University of California Irvine), Mohammad Abdullah Al Faruque (University of California, Irvine), Qi Alfred Chen (University of California, Irvine)
Autonomous Target Tracking (ATT) systems, especially ATT drones, are widely used in applications such as surveillance, border control, and law enforcement, while also being misused in stalking and destructive actions. Thus, the security of ATT is highly critical for real-world applications. Under the scope, we present a new type of attack: textit{distance-pulling attacks} (DPA) and a systematic study of it, which exploits vulnerabilities in ATT systems to dangerously reduce tracking distances, leading to drone capturing, increased susceptibility to sensor attacks, or even physical collisions. To achieve these goals, we present textit{FlyTrap}, a novel physical-world attack framework that employs an adversarial umbrella as a deployable and domain-specific attack vector. FlyTrap is specifically designed to meet key desired objectives in attacking ATT drones: physical deployability, closed-loop effectiveness, and spatial-temporal consistency. Through novel progressive distance-pulling strategy and controllable spatial-temporal consistency designs, FlyTrap manipulates ATT drones in real-world setups to achieve significant system-level impacts. Our evaluations include new datasets, metrics, and closed-loop experiments on real-world white-box and even commercial ATT drones, including DJI and HoverAir. Results demonstrate FlyTrap's ability to reduce tracking distances within the range to be captured, sensor attacked, or even directly crashed, highlighting urgent security risks and practical implications for the safe deployment of ATT systems.
-
Yihao Chen (DCST & BNRist & State Key Laboratory of Internet Architecture, Tsinghua University; Zhongguancun Laboratory), Qi Li (INSC & State Key Laboratory of Internet Architecture, Tsinghua University; Zhongguancun Laboratory), Ke Xu (DCST & State Key Laboratory of Internet Architecture, Tsinghua University; Zhongguancun Laboratory), Zhuotao Liu (INSC & State Key Laboratory of Internet Architecture, Tsinghua University; Zhongguancun Laboratory), Jianping Wu (INSC & State Key Laboratory of Internet Architecture, Tsinghua University; Zhongguancun Laboratory)
The partial deployment of Route Origin Validation (ROV) poses an unexpected security threat known as stealthy BGP hijacking, emph{i.e.,} a particularly elusive form of BGP hijacking where malicious routes divert traffic without reaching (and thus alerting) the victims. This risk remains largely unexplored, with neither documented real-world incidents nor systematic characterization available. To bridge this gap, we formalize stealthy BGP hijacking and propose heuristics to discover potential instances through routing table discrepancies. We conduct the first emph{empirical} study to track and profile stealthy BGP hijacking in the wild, contributing a curated real-world incident dataset and a long-term monitoring service. Inspired by the empirical insights, we further conduct an emph{analytical} study to exhaustively assess the risk. This requires accurate ROV deployment data, complete Internet-wide routes, and tailored analytical models. To address these challenges, we develop SHAMAN, a BGP route inference framework dedicated to assessing stealthy BGP hijacking risk. SHAMAN consolidates multiple sources to construct an accurate view of ROV deployment, infers complete Internet-wide routes through a highly efficient matrix-based approach, and facilitates statistical risk analysis via a "victim-target-hijacker" 3-tuple model. By reducing the time for generating Internet-scale routes from over three months to just 5.22 hours, SHAMAN enables systematic risk assessment across 8.3 billion generated routes under real-world ROV deployment. Our findings reveal a 14.1% overall success probability for stealthy BGP hijacking, with targeted attacks reaching 99.5% success in specific cases. Validation against our real-world dataset shows up to 95.9% incident-level accuracy, demonstrating the fidelity of our analytical results.
Return of the Phish
-
Mohammad Majid Akhtar (University of New South Wales), Rahat Masood (University of New South Wales), Muhammad Ikram (Macquarie University), Salil S. Kanhere (University of New South Wales)
Malicious actors on online social networks (OSNs) use script-controlled social bots that engage users through replies or comments. These bots are programmed to activate only when specific trigger keywords appear in posts. We refer to such advanced context-aware campaigners as trigger bot (TB) agents, which aim to deceive users into making payments for illicit products or revealing sensitive financial credentials. This paper presents a systematic and data-driven study on the detection and characterization of TB agents. We introduce TBTrackerX, a novel framework designed to collect and analyze TB activity. Using this system, we captured 4,452 TB agent replies from 2,647 unique TB agents, targeting our honeytrap account, and uncovered interactions with over 84K users on X. Our results show that TB agents evade detection by using contextually similar replies (with similarity scores up to 0.97), exhibiting intermittent posting patterns (in bursts ranging from 15 seconds to 5 minutes), and adopting dormant behavior after peak campaign activity. Furthermore, we identify a coordinated TB ecosystem, characterized by fake TB followers and shared TB masters. This study underscores the pressing need for better moderation and detection mechanisms to combat these sophisticated forms of social media manipulation.
-
Ruixuan Li (Tsinghua University), Chaoyi Lu (Zhongguancun Laboratory), Baojun Liu (Tsinghua University), Yanzhong Lin (Coremail Technology Co. Ltd), Qingfeng Pan (Coremail Technology Co. Ltd), Jun Shao (Zhejiang Gongshang University; Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology)
This paper introduces a novel and powerful email convergence amplification attack, named COORDMAIL. Traditional email DoS attacks primarily send spam to targeted mailboxes, with little ability to affect email servers’ operation. In contrast, COORDMAIL exploits the inherent properties of the SMTP protocol, i.e., long session timeouts and client-controlled interactions, to cleverly coordinate reflected emails from various email middleware and eventually direct them to an incoming mail server simultaneously. As a result, the amplification capabilities of different email middleware are concentrated to form highly amplified attack traffic. From the SMTP session state machine and email reflection behaviors, we identify many real-world email middleware suitable for COORDMAIL, including 10,079 bounce servers, 584 open email relays, and 6 email forwarding providers. By building SMTP command sequences, COORDMAIL can maintain prolonged SMTP communications with these middleware at an extremely low rate and control them to reflect emails steadily at any given moment. We show that COORDMAIL is effective at a low cost: 1000 SMTP connections can achieve more than 30,000 times of bandwidth amplification. While most existing security mechanisms are ineffective against COORDMAIL, we propose feasible mitigations that reduce the convergence amplification power of COORDMAIL by tens of times. We have responsibly reported COORDMAIL to email middleware and popular email providers, some of which have accepted our recommendations.
-
Mengying Wu (Fudan University), Geng Hong (Fudan University), Jiatao Chen (Fudan University), Baojun Liu (Tsinghua University), Mingxuan Liu (Zhongguancun Laboratory), Min Yang (Fudan University)
Email addresses serve as a universal identifier for online account management, however, their aliasing mechanisms introduce significant identity confusion between email providers and external platforms. This paper presents the first systematic analysis of the inconsistencies arising from email aliasing, where providers view alias addresses (e.g., [email protected], [email protected]) as additional entrances of the base email ([email protected]), while platforms often treat them as distinct identities.
Through empirical evaluations the alias mechanisms of 28 email providers and 18 online platforms, we reveal critical gaps: (1) Only Gmail fully documents its aliasing rules, while 11 providers silently support undocumented alias behaviors; (2) Due to lack of standardization documentation and de facto implementation, platforms either failed to distinguish alias addresses or over aggressive excluded all emails containing specific symbol. Real-world abuse cases demonstrate attackers exploiting aliases to create up to 139 accounts from a single base email in npm for spam campaigns. Our user study further highlights security risks, showing 31.65% of participants with alias knowledge mistake phishing emails as legitimate emails alias due to inconsistent provider implementations. Users who believe they understand email aliasing, especially those highly educated, male, and technical participants, are more susceptible to being phished.
Our findings underscore the urgent need for standardization and transparency in email aliasing. We contribute the OriginMail tool to help platforms resolve alias confusion and disclose vulnerabilities to affected stakeholders.
Once Upon a Time in Memory
-
Yubo Du (University of Pittsburgh), Youtao Zhang (University of Pittsburgh), Jun Yang (University of Pittsburgh)
Low-level programming languages like C and C++ offer dynamic memory management capabilities but are vulnerable to Use-After-Free (UAF) vulnerabilities due to improper deallocation handling. These vulnerabilities, arising from accessing memory through dangling pointers, pose significant risks. While various defense mechanisms have been proposed, existing solutions often face challenges such as high performance overhead, excessive memory usage, or inadequate security guarantees, limiting their practicality. Pointer Nullification (PN) has gained attention as a promising UAF mitigation technique by tracking pointers and nullifying them upon buffer deallocation. However, existing PN techniques incur inefficiencies due to precisely associating each pointer with its target buffer, leading to expensive metadata lookups. Moreover, they overlook spatial locality in pointer storage, resulting in a larger number of registrations than necessary. This paper introduces Fast Pointer Nullification (FPN), a new PN-based defense that organizes metadata at the region level to eliminate costly search operations and uses block-based registration to efficiently capture pointer locality. Experimental results on SPEC CPU benchmarks and real-world applications show that FPN offers strong security guarantees while significantly reducing performance and memory overhead compared to prior PN-based techniques. FPN is also compatible with multithreaded environments and large-scale web applications.
-
Kyle Zeng (Arizona State University), Moritz Schloegel (CISPA Helmholtz Center for Information Security), Christopher Salls (UC Santa Barbara), Adam Doupé (Arizona State University), Ruoyu Wang (Arizona State University), Yan Shoshitaishvili (Arizona State University), Tiffany Bao (Arizona State University)
Code reuse attacks are one of the most crucial cornerstones of modern memory corruption-based attacks. However, the task of stitching gadgets together remains a time-consuming and manual process. Plenty of research has been published over the past decade that aims at automating this problem, but very little has been adopted in practice. Solutions are often impractical in terms of performance or supported architectures, or they fail to generate a valid chain. A systematic analysis reveals they all use a generate-and-test approach, where they first enumerate all gadgets and then use symbolic execution or SMT solvers to reason about which gadgets to combine into a chain.Unfortunately, this approach scales exponentially to the number of available gadgets, thus limiting scalability on larger binaries.
In this work, we revisit this fundamental strategy and propose a new grouping of gadgets, which we call ROPBlock, that exhibit one crucial difference to gadgets: ROPBlocks are guaranteed to be chainable. We combine this notion of ROPBlock with a graph search algorithm and propose a gadget chaining approach that significantly improves performance compared to prior work. We successfully reduce the time complexity of setting registers to attacker-specified values from O(2^n) to O(n). This yields a 2--3 orders of magnitude speed-up in practice during chain generation. At the same time, ROPBlocks allow us to model complex gadgets---such as those involving ret2csu or with conditional branches---that most other approaches fail to consider by design. And as ROPBlocks are architecture-agnostic, our approach can be applied to diverse architectures.
Our prototype, ropbot, generates complex, real-world chains invoking dup-dup-execve within 2.5s on average for all 37 binaries in our evaluation. All but one other approach fails to generate any chain for this scenario. For mmap chains, a difficult scenario that requires setting six register values, ropbot finds chains for 5x more targets than the second-best technique. To show its versatility, we evaluate ropbot on x64, MIPS, ARM, and AArch64. We added RISC-V support in less than two hours by adding twelve lines of code. Finally, we demonstrate that ropbot outperforms all existing tools on their respective datasets.
-
Jingcheng Yang (Tsinghua University), Enze Wang (National University of Defense Technology & Tsinghua University), Jianjun Chen (Tsinghua University), Qi Wang (Tsinghua University), Yuheng Zhang (Tsinghua University), Haixin Duan (Quancheng Lab,Tsinghua University), Wei Xie (College of Computer, National University of Defense Technology), Baosheng Wang (National University of Defense Technology)
JSON Web Tokens (JWT) have become a widely adopted standard for secure information exchange in modern distributed web applications, particularly for authentication and authorization scenarios. However, JWT implementations have introduced various vulnerabilities, such as signature verification bypass, token spoofing, and denial-of-service attacks. While prior research has reported individual such vulnerabilities, there is a lack of systematic study for JWT implementations.
In this paper, we propose JWTFuzz, a novel testing methodology to effectively discover JWT vulnerabilities in JWT implementations. We evaluated JWTFuzz against 43 JWT implementations across 10 popular programming languages and discovered 31 previously unknown security vulnerabilities, 20 of which have been assigned CVE numbers. We demonstrated the security impact of these vulnerabilities, such as enabling authentication bypass in Kubernetes and denial-of-service attacks against Apache James. We further categorized these vulnerabilities into five types, and proposed several mitigation strategies. We discussed our mitigation strategies with the IETF, which has acknowledged our findings and suggested that they would adopt our mitigations in a new RFC document. We have also reported those identified vulnerabilities to the affected providers and received acknowledgments and bug bounty rewards from Apache, Connect2id, Kubernetes, Let's Encrypt, and RedHat.
UX Men: Days of Future Auth
-
Andrea Infantino (University of Illinois Chicago), Mir Masood Ali (University of Illinois Chicago), Kostas Solomos (University of Illinois Chicago), Jason Polakis (University of Illinois Chicago)
Password managers significantly improve password-based authentication by generating strong and unique passwords, while also streamlining the actual authentication process through autofill functionality. Crucially, autofill provides additional security protections when employed within a traditional browsing environment, as it can trivially thwart phishing attacks due to the website's domain information being readily available. With the increasing trend of major web services deploying standalone native apps, password managers have also started offering universal autofill and other user-friendly capabilities for desktop environments. However, it is currently unknown how password managers' security protections operate in these environments. In this paper, we fill that gap by presenting the first systematic empirical analysis of the autofill-related functionalities made available by popular password managers (including 1Password and LastPass) in major desktop environments (macOS, Windows, Linux). We experimentally find that password managers adopt different strategies for interacting with desktop apps and employ widely different levels of safeguards against UI-based attacks. For instance, on macOS, we find that a high level of security can be achieved by leveraging OS-provided APIs and checks, while on Windows we identify a lack of proper security checks mainly due to OS limitations. In each scenario, we demonstrate proof-of-concept attacks that allow other apps to bypass the security checks in place and stealthily steal users' credentials, one-time passwords, and vault secret keys through unobservable simulated key presses. Accordingly, we propose a series of countermeasures that can mitigate our attacks. Due to the severity of our attacks, we disclosed our findings and proposed countermeasures to the analyzed password manager vendors, which has kickstarted the remediation process for certain vendors and also been awarded a bug bounty. Finally, we will share our code to facilitate additional research towards fortifying password managers.
-
Zhen Li (Nankai University), Ding Wang (Nankai University)
As the number of users' password accounts are constantly increasing, users are more and more inclined to reuse passwords. Recently, considerable efforts have been made to construct targeted password guessing models to characterize users' password reuse behaviors. However, existing studies mainly focus on characterizing slight modifications by training only on similar password pairs (e.g., textnormal{texttt{Shark0301} → texttt{shark03}}). This leads to overfitting and causes existing models to overlook users' large modification behaviors (e.g., textnormal{texttt{Shark0301} → texttt{Bear03}}). To fill this gap, this paper introduces a new non-parametric method named emph{k}-nearest-neighbors targeted password guessing (KNN-TPG). KNN-TPG builds a datastore that retains the context vector of all source passwords along with prefixes of the targeted passwords. During the generation of a new password, KNN-TPG retrieves emph{k} nearest neighbor vectors from the datastore to ensure that the generated passwords align better with realistic password distributions. By creatively combining KNN-TPG with our proposed Transformer-based password model, we propose a new targeted password guessing model, namely KNNGuess. At each step of generating a new password, KNNGuess predicts and utilizes three distinct distributions, aiming to comprehensively model users' password reuse behaviors.
We demonstrate the effectiveness of our KNNGuess model and the KNN-TPG method through extensive experiments, which include 12 large-scale real-world password datasets, containing 4.8 billion passwords. More specifically, when the victim's password at site A is compromised (namely $pw_A$), within 100 guesses, the cracking success rate of KNNGuess for guessing her password at site B (namely $pw_B$, and $pw_B$$neq$$pw_A$) is 25.40% (for common users) and 10.26% (for security-savvy users), which is 8.52%-119.0% (avg. 55.33%) higher than its foremost counterparts. When comparing with state-of-the-art password models (i.e., Pass2Edit and PointerGuess), this value is 8.52%-27.66% (avg. 18.09%) higher. Our results highlight that the threat of password tweaking attacks is higher than users expected.
-
Vinny Adjibi (Georgia Institute of Technology), Athanasios Avgetidis (Georgia Institute of Technology), Manos Antonakakis (Georgia Institute of Technology), Alberto Dainotti (Georgia Institute of Technology), Michael Bailey (Georgia Institute of Technology), Fabian Monrose (Georgia Institute of Technology)
The Uniform Domain Name Dispute Resolution Policy (UDRP) seeks to balance two competing goals: empowering trademark holders to swiftly address domain name abuses—such as the sale of counterfeit goods that often bypass technical safeguards like blocklists—and protecting registrants from aggressive legal tactics by overreaching trademark claimants. Since its inception, the UDRP has become the de facto dispute resolution mechanism for over twelve hundred domain extensions, a substantial increase from the original three. However, despite its successes, critics argue that the policy enables practices that undermine trust and fairness. Unfortunately, meaningful reform efforts have stalled due to the absence of large-scale structured data, limiting empirical evaluations and leaving foundational questions unanswered for more than two decades.
To address this long-standing gap, we trained models to extract structured data from 90,153 UDRP dispute proceedings, enabling the most comprehensive empirical analysis of the policy to date. Our findings shed light on several issues, showing evidence of forum shopping in almost one-third of all the disputes, potential conflicts of interest in 43 cases, and delays (by many parties) that fall well outside the expected response times—all of which impact the perceived fairness and efficiency of UDRP. Beyond eroding trust, those issues create serious security challenges: 2,751 malicious domains remained under malicious actors' control for up to four months after a panel ordered their transfer. Overall, our findings underscore the need for policy reform to help restore trust and improve transparency in the Internet's de facto standard for countering trademark infringement. Based on our discoveries, we recommend introducing greater automation, strengthening oversight, and enforcing clearer compliance rules to ensure that the UDRP remains a reliable tool for trademark-based name disputes—especially as the Internet continues to expand with new generic top-level domains (in 2026) and an increasingly hostile digital environment.
The Walls Have Ears
-
Xiaomeng Chen (Shanghai Jiao Tong University), Jike Wang (Shanghai Jiao Tong University), Zhenyu Chen (Shanghai Jiao Tong University), Qi Alfred Chen (University of California, Irvine), Xinbing Wang (Shanghai Jiao Tong University), Dongyao Chen (Shanghai Jiao Tong University)
We discover that enabling both eavesdropping and non-invasive, per-key injection is viable on keyboards, in particular, the fast-emerging commodity Hall-effect keyboards. This paper introduces DualStrike, a new attack system that allows attackers to remotely listen to victim input and control any key on a Hall-effect keyboard. This capability opens doors to severe attacks (e.g., file deletion, private key theft, and tampering) based on the victim’s input and context, all without requiring hardware or software modifications to the victim’s computer. We present several key innovations in DualStrike, including a novel, compact electromagnet-based hardware design for high-frequency magnetic spoofing, a synchronization-free attack scheme, and a magnetometer-based listening mechanism using commercial off-the-shelf components. Our real-world experiments demonstrate that DualStrike can reliably compromise arbitrary keys across six recent Hall-effect keyboard models. Specifically, DualStrike achieves over 98.9% keystroke injection accuracy across all tested models. In an end-to-end test, the eavesdropping module achieves a high listening accuracy (i.e., above 99%). To improve the robustness of DualStrike, we implement a calibration algorithm to account for keyboard displacement, allowing it to maintain 98.5% injection accuracy even with offsets up to 4 cm. We also identified DualStrike’s immunity to existing magnetic shielding mechanisms and proposed a novel shielding approach for Hall-effect keyboards.
-
Youqian Zhang (The Hong Kong Polytechnic University), Zheng Fang (The Hong Kong Polytechnic University), Huan Wu (The Hong Kong Polytechnic University & Technological and Higher Education Institute of Hong Kong), Sze Yiu Chau (The Chinese University of Hong Kong), Chao Lu (The Hong Kong Polytechnic University), Xiapu Luo (The Hong Kong Polytechnic University)
Optical fibers are widely regarded as reliable communication channels due to their resistance to external interference and low signal loss.
This paper demonstrates a critical side channel within telecommunication optical fiber that allows for acoustic eavesdropping. By exploiting the sensitivity of optical fibers to acoustic vibrations, attackers can remotely monitor sound-induced deformations in the fiber structure and further recover information from the original sound waves.This issue becomes particularly concerning with the proliferation of Fiber-to-the-Home (FTTH) installations in modern buildings. Attackers with access to one end of an optical fiber can use commercially available Distributed Acoustic Sensing (DAS) systems to tap into the private environment surrounding the other end. However, because the optical fiber alone is not sensitive enough to airborne sound, we introduce a ``Sensory Receptor'' that improves acoustic capture. Our results demonstrate the ability to recover critical information, such as human activities, indoor localization, and conversation contents, raising important privacy concerns for fiber-optic communication networks.
-
Ruizhe Wang (University of Waterloo), Roberta De Viti (MPI-SWS), Aarushi Dubey (University of Washington), Elissa Redmiles (Georgetown University)
The voluntary donation of private health information for altruistic purposes, such as supporting research advancements, is a common practice. However, concerns about data misuse and leakage may deter people from donating their information. Privacy Enhancing Technologies (PETs) aim to alleviate these concerns and, in turn, allow for safe and private data sharing. This study conducts a vignette survey (N = 494) with participants recruited from Prolific to examine the willingness of US-based people to donate medical data for developing new treatments under four general guarantees offered across PETs: data expiration, anonymization, purpose restriction, and access control. The study explores two mechanisms for verifying these guarantees: self-auditing and expert auditing, and controls for the impact of confounds including demographics and two types of data collectors: for-profit and non-profit institutions.
Our findings reveal that respondents hold such high expectations of privacy from non-profit entities a priori that explicitly outlining privacy protections has little impact on their overall perceptions. In contrast, offering privacy guarantees elevates respondents’ expectations of privacy for for-profit entities, bringing them nearly in line with those for non-profit organizations. Further, while the technical community has suggested audits as a mechanism to increase trust in PET guarantees, we observe limited effect from transparency about such audits. We emphasize the risks associated with these findings and underscore the critical need for future interdisciplinary research efforts to bridge the gap between the technical community’s and end-users’ perceptions regarding the effectiveness of auditing PETs.
-
Byeongdo Hong (The Affiliated Institute of ETRI), Gunwoo Yoon (The Affiliated Institute of ETRI)
LTE networks employ Globally Unique Temporary Identifiers (GUTIs) to shield subscribers from permanent International Mobile Subscriber Identity (IMSI) exposure, yet we show that these identifiers can be resolved and linked to specific devices through passive observation without prior knowledge of targets. We correlate time-stamped visual observations of device use with over-the-air control-plane messages captured using commodity Software-Defined Radios (SDRs). A Finite-State-Machine (FSM) algorithm processes the synchronized streams to resolve each device's GUTI within the camera's Field of View (FoV), requiring as few as three observed user interactions when the corresponding control-plane messages are captured.
Field experiments across multiple commercial Long-Term Evolution (LTE) networks validate multi-target resolution: In some deployments, we observed GUTIs persisting for up to 33 days, with reassignment behaviors that were often linkable. Once linked, these long-lived identifiers enable hierarchical location tracking--from cell to paging-area scale--by passively monitoring paging and Radio Resource Control (RRC) messages. Unlike active IMSI catchers or prior GUTI attacks that require pre-existing identifiers (e.g., phone numbers) and active probing, our approach is listen-only and scales to multiple devices within view.
Catch Me If You Can, Cookie
-
Amrita Roy Chowdhury (University of Michigan, Ann Arbor), David Glukhov (University of Toronto), Divyam Anshumaan (University of Wisconsin), Prasad Chalasani (Langroid), Nicholas Papernot (University of Toronto), Somesh Jha (University of Wisconsin), Mihir Bellare (UCSD)
The rise of large language models (LLMs) has introduced new privacy challenges, particularly during textit{inference} where sensitive information in prompts may be exposed to proprietary LLM APIs. In this paper, we address the problem of formally protecting the sensitive information contained in a prompt while maintaining response quality. To this end, first, we introduce a cryptographically inspired notion of a textit{prompt sanitizer} which transforms an input prompt to protect its sensitive tokens. Second, we propose Pr$epsilonepsilon$mpt, a novel system that implements a prompt sanitizer, focusing on the sensitive information that can be derived solely from the individual tokens. Pr$epsilonepsilon$mpt categorizes sensitive tokens into two types: (1) those where the LLM's response depends solely on the format (such as SSNs, credit card numbers), for which we use format-preserving encryption (FPE); and (2) those where the response depends on specific values, (such as age, salary) for which we apply metric differential privacy (mDP). Our evaluation demonstrates that Pr$epsilonepsilon$mpt is a practical method to achieve meaningful privacy guarantees, while maintaining high utility compared to unsanitized prompts, and outperforming prior methods.
-
Chen GONG (University of Virginia), Zheng Liu (University of Virginia), Kecen Li (University of Virginia), Tianhao Wang (University of Virginia)
Recently, offline reinforcement learning (RL) has become a popular RL paradigm. In offline RL, data providers share pre-collected datasets—either as individual transitions or sequences of transitions forming trajectories—to enable the training of RL models (also called agents) without direct interaction with the environments. Offline RL saves interactions with environments compared to traditional RL, and has been effective in critical areas, such as navigation tasks. Meanwhile, concerns about privacy leakage from offline RL datasets have emerged.
To safeguard private information in offline RL datasets, we propose the first differential privacy (DP) offline dataset synthesis method, PrivORL, which leverages a diffusion model and diffusion transformer to synthesize textit{transitions and trajectories}, respectively, under DP. The synthetic dataset can then be securely released for downstream analysis and research. PrivORL adopts the popular approach of pre-training a synthesizer on public datasets, and then fine-tuning on sensitive datasets using DP Stochastic Gradient Descent (DP-SGD).
Additionally, PrivORL introduces curiosity-driven pre-training, which uses feedback from the curiosity module to diversify the synthetic dataset and thus can generate diverse synthetic transitions and trajectories that closely resemble the sensitive dataset.
Extensive experiments on five sensitive offline RL datasets show that our method achieves better utility and fidelity in both DP transition and trajectory synthesis compared to baselines. The replication package is available at the anonymous link. -
Friedemann Lipphardt (MPI-INF), Moonis Ali (MPI-INF), Martin Banzer (MPI-INF), Anja Feldmann (MPI-INF), Devashish Gosain (IIT Bombay)
Large language models (LLMs) are widely used for information access, yet their content moderation behavior varies sharply across geographic and linguistic contexts. This paper presents a first comprehensive analysis of content moderation patterns detected in over 700,000 replies from 15 leading LLMs evaluated from 12 locations using 1,118 sensitive queries spanning five categories in 13 languages.
We find substantial geographic variation, with moderation rates showing relative differences up to 60% across locations—for instance, soft moderation (e.g., evasive replies) appears in 14.3% of German contexts versus 24.9% in Zulu contexts. Category-wise, misc. (generally unsafe), hate speech, and sexual content are more heavily moderated than political or religious content, with political content showing the most geographic variability. We also observe discrepancies between online and offline model versions, such as DeepSeek exhibiting 15.2% higher relative soft moderation rates when deployed locally than via API. The response length (and time) analysis reveals that moderated responses are, on average, about 50% shorter than the unmoderated ones.
These findings have important implications for AI fairness and digital equity, as users in different locations receive inconsistent access to information. We provide the first systematic evidence of geographic cross-language bias in LLM content moderation and showcase how model selection vastly impacts user experience.
-
Fangyuan Sun (Qingdao University), Yaxi Yang (Singapore University of Technology and Design), Jia Yu (Qingdao University), Jianying Zhou (Singapore University of Technology and Design)
In data-driven applications, attribute-driven community search has attracted increasing attention, which aims to help users find high-quality subgraphs that meet specific requirements over attributed graphs. Nevertheless, few works consider data privacy when performing community search. One critical reason is that real-world graphs continue to grow in size, and attribute-driven community search involves computing complex metrics on encrypted graph data, including structural cohesiveness and attribute correlation, which are too time-consuming to be practical.
This paper is the first to propose a practical scheme for Privacy-preserving Attribute-driven Community Searches on the cloud, named as PACS. PACS enables servers to efficiently respond to attribute-driven community searches in near-millisecond time, without accessing sensitive information about the attributed graph and search results. To achieve this, we design two structures, a secure community index and a secure edge table, for protecting the privacy of the original attributed graph. The secure community index enables cloud servers to efficiently identify the target community that meets structural cohesiveness and has the highest attribute score. In particular, we employ inner product encryption to evaluate the attribute-driven scores of communities based on encrypted attribute vectors. The secure edge table, constructed by BGN homomorphic encryption, allows cloud servers to securely retrieve the edge information of the target community without knowing its details. We perform a thorough security analysis that demonstrates PACS achieves CQA2-security. Experimental evaluations on real-world social network datasets show that PACS achieves near-millisecond efficiency in processing attribute-driven community searches.
Mission: Improbable Robustness
-
Jonathan Evertz (CISPA Helmholtz Center for Information Security), Niklas Risse (Max Planck Institute for Security and Privacy), Nicolai Neuer (Karlsruhe Institute of Technology), Andreas Müller (Ruhr University Bochum), Philipp Normann (TU Wien), Gaetano Sapia (Max Planck Institute for Security and Privacy), Srishti Gupta (Sapienza University of Rome), David Pape (CISPA Helmholtz Center for Information Security), Soumya Shaw (CISPA Helmholtz Center for Information Security), Devansh Srivastav (CISPA Helmholtz Center for Information Security), Christian Wressnegger (Karlsruhe Institute of Technology), Erwin Quiring (_fbeta), Thorsten Eisenhofer (CISPA Helmholtz Center for Information Security), Daniel Arp (TU Wien), Lea Schönherr (CISPA Helmholtz Center for Information Security)
Large language models (LLMs) are increasingly prevalent in security research. Their unique characteristics, however, introduce challenges that undermine established paradigms of reproducibility, rigor, and evaluation. Prior work has identified common pitfalls in traditional machine learning research, but these studies predate the advent of LLMs. In this paper, we identify emph{nine} common pitfalls that have become (more) relevant with the emergence of LLMs and that can compromise the validity of research involving them. These pitfalls span the entire computation process, from data collection, pre-training, and fine-tuning to prompting and evaluation.
We assess the prevalence of these pitfalls across all 72 peer-reviewed papers published at leading Security and Software Engineering venues between 2023 and 2024. We find that every paper contains at least one pitfall, and each pitfall appears in multiple papers. Yet only 15.7% of the present pitfalls were explicitly discussed, suggesting that the majority remain unrecognized. To understand their practical impact, we conduct four empirical case studies showing how individual pitfalls can mislead evaluation, inflate performance, or impair reproducibility. Based on our findings, we offer actionable guidelines to support the community in future work.
-
Zion Leonahenahe Basque (Arizona State University), Samuele Doria (University of Padua), Ananta Soneji (Arizona State University), Wil Gibbs (Arizona State University), Adam Doupe (Arizona State University), Yan Shoshitaishvili (Arizona State University), Eleonora Losiouk (University of Padua), Ruoyu “Fish” Wang (Arizona State University), Simone Aonzo (EURECOM)
Large Language Models (LLMs) are revolutionizing fields previously dominated by human effort. This work presents the first systematic investigation of how LLMs can team with analysts during software reverse engineering (SRE). To accomplish this, we first document the state of LLMs in SRE with an online survey of 153 practitioners, and then we design a fine-grained human study on two Capture-The-Flag-style binaries representative of real-world software.
In our human study, we instrumented the SRE workflow of 48 participants (split between 24 novices and 24 experts), observing over 109 hours of SRE. Through 18 findings, we found various benefits and harms of LLMs in SRE. Remarkably, we found that LLM assistance narrows the expertise gap: novices' comprehension rate rises by approximately 98%, matching that of experts, whereas experts gain little; however, they also had harmful hallucinations, unhelpful suggestions, and ineffective results. Known-algorithm functions are triaged up to 2.4x faster, and artifact recovery (names, comments, types) increases by at least 66%. Overall, our findings identify powerful synergies of humans and LLMs in SRE, but also emphasize the significant shortcomings of LLMs in their current integration.
-
Nuno Sabino (Carnegie Mellon University, Instituto Superior Técnico, Universidade de Lisboa, and Instituto de Telecomunicações), Darion Cassel (Carnegie Mellon University), Rui Abreu (Universidade do Porto, INESC-ID), Pedro Adão (Instituto Superior Técnico, Universidade de Lisboa, and Instituto de Telecomunicações), Lujo Bauer (Carnegie Mellon University), Limin Jia (Carnegie Mellon University)
DOM-based cross-site scripting (DOM-XSS) is a prevalent form of web vulnerability. Prior work on automated detection and confirmation of such vulnerabilities at scale has several limitations. First, prior work does not interact with the page and thus misses vulnerabilities in event handlers whose execution depends on user actions. Second, prior work does not find URL components, such as GET parameters and fragment values that, when instantiated with specific keys/values, execute more code paths. To address this, we introduce SWIPE, a DOM- XSS analysis infrastructure that uses fuzzing to generate user interactions to trigger event handlers and leverages dynamic symbolic execution (DSE) to automatically synthesize URL parameters and fragments. We run SWIPE on 44,480 URLs found in pages from the Tranco top 30,000 popular domains. Compared to prior work, SWIPE’s fuzzer finds 15% more vulnerabilities. Additionally, we find that a lack of parameters and fragments in URLs significantly hinders DOM-XSS detection, and show that SWIPE’s DSE engine can synthesize previously unseen URL parameters and fragments that trigger 20 new vulnerabilities.
-
Yaru Yang (Tsinghua University), Yiming Zhang (Tsinghua University), Tao Wan (CableLabs & Carleton University), Haixin Duan (Tsinghua University & Quancheng Laboratory), Deliang Chang (QI-ANXIN Technology Research Institute), Yishen Li (Tsinghua University), Shujun Tang (Tsinghua University & QI-ANXIN Technology Research Institute)
Femtocells are small, operator-deployed base stations designed to extend mobile network coverage, but their integration into operator mobile infrastructure introduces significant new attack surfaces. While 5G femtocell standards were only recently finalized, 4G LTE femtocells have already been standardized and widely implemented. In this work, we conducted the first systematic security evaluation of 4G LTE femtocells based on both real-world commercial devices and large-scale Internet measurements. We systematically analyzed both the software and hardware of 4 commercial femtocell devices and identified 5 critical and common vulnerabilities that can lead to local or remote compromise. Our Internet-wide measurement identified 86,108 suspected femtocell deployments, many of which are exposed to remote attack. Further, we experimentally validated in a real operator network that a single compromised femtocell can serve as a powerful entry point for attacks on both the mobile core network and its subscribers. Our findings highlight that femtocell security in operational 4G LTE networks remains an urgent concern. We reported our results to Global System for Mobile Communications Association (GSMA) and the 3rd Generation Partnership Project (3GPP) Service and System Aspects Working Group 3 (SA3). 3GPP SA3 has subsequently approved both a study item to further enhance the security of 5G femtocells and a work item to define the Security Assurance Specification (SCAS) for 5G femtocells.
The Phantom of the Opera tor
-
Quan Yuan (Zhejiang University), Zhikun Zhang (Zhejiang University), Linkang Du (Xi'an Jiaotong University), Min Chen (Vrije Universiteit Amsterdam), Mingyang Sun (Peking University), Yunjun Gao (Zhejiang University), Shibo He (Zhejiang University), Jiming Chen (Zhejiang University)
Video recognition systems are increasingly being deployed in daily life, such as content recommendation and security monitoring. To enhance video recognition development, many institutions have released high-quality public datasets with open-source licenses for training advanced models. At the same time, these datasets are also susceptible to misuse and infringement. Dataset copyright auditing is an effective solution to identify such unauthorized use. However, existing dataset copyright solutions primarily focus on the image domain; the complex nature of video data leaves dataset copyright auditing in the video domain unexplored. Specifically, video data introduces an additional temporal dimension, which poses significant challenges to the effectiveness and stealthiness of existing methods.
In this paper, we propose VICTOR, the first dataset copyright auditing approach for video recognition systems. We develop a general and stealthy sample modification strategy that enhances the output discrepancy of the target model. By modifying only a small proportion of samples (e.g., 1%), VICTOR amplifies the impact of published modified samples on the prediction behavior of the target models. Then, the difference in the model’s behavior for published modified and unpublished original samples can serve as a key basis for dataset auditing. Extensive experiments on multiple models and datasets highlight the superiority of VICTOR. Finally, we show that VICTOR is robust in the presence of several perturbation mechanisms to the training videos or the target models.
-
Alessandro Galeazzi (University of Padua), Pujan Paudel (Boston University), Mauro Conti (University of Padua), Emiliano De Cristofaro (UC Riverside), Gianluca Stringhini (Boston University)
In recent years, the opaque design and the limited public understanding of social networks' recommendation algorithms have raised concerns about potential manipulation of information exposure. Reducing content visibility, aka shadow banning, may help limit harmful content; however, it can also be used to suppress dissenting voices. This prompts the need for greater transparency and a better understanding of this practice.
In this paper, we investigate the presence of visibility alterations through a large-scale quantitative analysis of two Twitter/X datasets comprising over 40 million tweets from more than 9 million users, focused on discussions surrounding the Ukraine–Russia conflict and the 2024 US Presidential Elections. We use view counts to detect patterns of reduced or inflated visibility and examine how these correlate with user opinions, social roles, and narrative framings. Our analysis shows that the algorithm systematically penalizes tweets containing links to external resources, reducing their visibility by up to a factor of eight, regardless of the ideological stance or source reliability. Rather, content visibility may be penalized or favored depending on the specific accounts producing it, as observed when comparing tweets from the Kyiv Independent and RT.com or tweets by Donald Trump and Kamala Harris. Overall, our work highlights the importance of transparency in content moderation and recommendation systems to protect the integrity of public discourse and ensure equitable access to online platforms.
-
Jie Kong (Dept. of Computer Science and Engineering, University of Connecticut, Storrs, CT), Damon James (Dept. of Computer Science and Engineering, University of Connecticut, Storrs, CT), Hemi Leibowitz (Faculty of Computer Science, The College of Management Academic Studies, Rishon LeZion, Israel), Ewa Syta (Dept. of Computer Science, Trinity College, Hartford, CT), Amir Herzberg (Dept. of Computer Science and Engineering, University of Connecticut, Storrs, CT)
We present CTng, an evolutionary and practical PKI design that efficiently addresses multiple key challenges faced by deployed PKI systems. CTng ensures strong security properties, including guaranteed transparency of certificates and guaranteed, unequivocal revocation, achieved under NTTP-security, i.e., without requiring trust in any single CA, logger, or relying party. These guarantees hold even in the presence of arbitrary corruptions of these entities, assuming only a known bound (f) of corrupt monitors (e.g., f=8), with minimal performance impact. CTng also enables offline certificate validation and preserves relying-party privacy, while providing scalable and efficient distribution of revocation updates.
These properties significantly improve upon current PKI designs. In particular, while Certificate Transparency (CT) aims to eliminate single points of trust, the existing specification still assumes benign loggers. Addressing this through log redundancy is possible, but rather inefficient, limiting deployed configurations to f ≤ 2.
We present a security analysis and an evaluation of our open-source CTng prototype, showing that it is efficient and scalable under realistic deployment conditions.
-
Hongze Wang (Southeast University), Zhen Ling (Southeast University), Xiangyu Xu (Southeast University), Yumingzhi Pan (Southeast University), Guangchi Liu (Southeast University), Junzhou Luo (Southeast University), Xinwen Fu (University of Massachusetts Lowell)
I2P (Invisible Internet Project) is a popular anonymous communication network. While existing de-anonymization methods for I2P focus on identifying potential traffic patterns of target hidden services among extensive network traffic, they often fail to scale effectively across the large and diverse I2P network, which consists of numerous routers. In this paper, we introduce I2PERCEPTION a low-cost approach revealing the IP addresses of I2P hidden services. In I2PERCEPTION, attackers deploy floodfill routers to passively monitor I2P routers and collect their RouterInfo. We analyze the router information publication mechanism to accurately identify routers' join (i.e. on) and leave (i.e. off) behaviors, enabling fine-grained live behavior inference across the I2P network. Active probing is used to obtain the live behavior (i.e., on-off patterns) of a target hidden service hosted on one of the I2P routers. By correlating the live behaviors of the target hidden service and I2P routers over time, we narrow down the set of routers matching the hidden service's behavior, revealing the hidden service's true network identity for de-anonymization. Through the deployment of only 15 floodfill routers over the course of eight months, we validate the precision and effectiveness of our approach with extensive real-world experiments. Our results show that I2PERCEPTION successfully de-anonymizes all controlled hidden services.
When Assumptions Fail
-
Qiguang Zhang (Southeast University), Junzhou Luo (Southeast University, Fuyao University of Science and Technology), Zhen Ling (Southeast University), Yue Zhang (Shandong University), Chongqing Lei (Southeast University), Christopher Morales (University of Massachusetts Lowell), Xinwen Fu (University of Massachusetts Lowell)
Building Automation Systems (BASs) are crucial for managing essential functions like heating, ventilation, air conditioning, and refrigeration (HVAC&R), as well as lighting and security in modern buildings. BACnet, a widely adopted open standard for BASs, enables integration and interoperability among heterogeneous devices. However, traditional BACnet implementations remain vulnerable to various security threats. While existing fuzzers have been applied to BACnet, their efficiency is limited, particularly due to the slow bus-based communication medium with low throughput. To address these challenges, we propose BACsFuzz, a behavior-driven fuzzer aimed at uncovering vulnerabilities in BACnet systems. Unlike traditional fuzzing approaches focused on input diversity and execution path coverage, BACsFuzz introduces the token-seize-assisted fuzzing technique, which leverages the token-passing mechanism of BACnet for improved fuzzing efficiency. The token-seize-assisted fuzzing technique proves highly effective in uncovering vulnerabilities caused by the misuse of implicitly reserved fields. We identify this issue as a common vulnerability affecting both BACnet and KNX, another major BAS protocol. Notably, the BACnet Association (ASHRAE) confirmed the presence of a protocol-level token-seize vulnerability, further validating the significance of this finding. We evaluated BACsFuzz on 15 BACnet and 5 KNX implementations from leading manufacturers, including Siemens, Honeywell, and Johnson Controls. BACsFuzz improves fuzzing throughput by 272.49% to 776.01% over state-of-the-art (SOTA) methods. In total, 26 vulnerabilities were uncovered--18 in BACnet and 8 in KNX--each related to implicitly reserved fields. Of these, 24 vulnerabilities were confirmed by manufacturers, with 9 assigned CVEs.
-
Mohsen Minaei (Visa Research), Ranjit Kumaresan (Visa Research), Andrew Beams (Visa Research), Pedro Moreno-Sanchez (IMDEA Software Institute, MPI-SP), Yibin Yang (Georgia Institute of Technology), Srinivasan Raghuraman (Visa Research and MIT), Panagiotis Chatzigiannis (Visa Research), Mahdi Zamani (Visa Research), Duc V. Le (Visa Research)
Blockchain auction plays an important role in the price discovery of digital assets (e.g. NFTs). However, despite their importance, implementing auctions directly on blockchains such as Ethereum incurs scalability issues. In particular, the on-chain transactions scale poorly with the number of bidders, leading to network congestion, increased transaction fees, and slower transaction confirmation time. This lack of scalability significantly hampers the ability of the system to handle large-scale, high-speed auctions that are common in today’s economy.
In this work, we build a protocol where an auctioneer can conduct sealed bid auctions that run entirely off-chain when parties behave honestly, and in the event that $k$ bidders deviate (e.g., do not open their sealed bid) from an $n$-party auction protocol, then the on-chain complexity is only $O(k)$. This improves over existing solutions that require $O(n)$ on-chain complexity, even if a single bidder deviates from the protocol. In the event of a malicious auctioneer, our protocol still guarantees that the auction will successfully terminate. We implement our protocol and show that it offers significant efficiency improvements compared to existing on-chain solutions. Our use of zero-knowledge Succinct Non-interactive ARgument of Knowledge for arithmetic (zkSnark) to achieve scalability also ensures that the on-chain contract and other participants do not acquire any information about the bidders’ identities and their respective bids, except for the winner and the winning bid amount.
-
Pujan Paudel (Boston University), Gianluca Stringhini (Boston University)
Online e-commerce scams, ranging from shopping scams to pet scams, globally cause millions of dollars in financial damage every year.
In response, the security community has developed highly accurate detection systems able to determine if a website is fraudulent.
However, finding candidate scam websites that can be passed as input to these downstream detection systems is challenging: relying on user reports is inherently reactive and slow, and proactive systems issuing search engine queries to return candidate websites suffer from low coverage and do not generalize to new scam types. In this paper, we present LOKI, a system designed to identify search engine queries likely to return a high fraction of fraudulent websites. LOKI implements a keyword scoring model grounded in Learning Under Privileged Information (LUPI) and feature distillation from Search Engine Result Pages (SERPs). We rigorously validate LOKI across 10 major scam categories and demonstrate a 20.58 times improvement in discovery over both heuristic and data- driven baselines across all categories. Leveraging a small seed set of only 1,663 known scam sites, we use the keywords identified by our method to discover 52,493 previously unreported scams in the wild. Finally, we show that LOKI generalizes to previously-unseen scam categories, highlighting its utility in surfacing emerging threats. -
Cheng Chu (Indiana University Bloomington), Qian Lou (University of Central Florida), Fan Chen (Indiana University Bloomington), Lei Jiang (Indiana University Bloomington)
Variational quantum algorithms (VQAs) have emerged as one of the most promising paradigms for achieving practical quantum advantage in the noisy intermediate-scale quantum (NISQ) era. To enhance the computational accuracy of VQAs on noisy hardware, zero noise extrapolation (ZNE) has become a widely adopted and effective error mitigation technique. However, the growing reliance on ZNE also increases the importance of identifying potential adversarial exploits. We examine existing backdoor attacks and highlight why they struggle to compromise ZNE. Specifically, quantum backdoor attacks that modify circuit structures merely shift the ideal output without affecting the noise-dependent extrapolation process, leaving ZNE intact. Likewise, parameter-level backdoors that are trained without accounting for device-specific noise exhibit inconsistent behavior across different hardware platforms, resulting in unreliable or ineffective attacks. Building on these observations, we uncover a new class of backdoor vulnerabilities that specifically target the unique properties of ZNE.
In this study, we propose QNBAD, a novel and stealthy backdoor attack targeting ZNE. QNBAD is carefully designed to preserve the correct functionality of variational quantum circuits on most devices. However, under a specific noise model, it leverages subtle interactions between quantum noise and circuit structure to systematically manipulate the sampled expectation values across different noise levels. This targeted perturbation corrupts the ZNE fitting process and leads to significantly biased final estimates. Compared to prior backdoor methods, QNBAD achieves substantially greater absolute error amplification, ranging from 1.68$times$ to 11.7$times$ across four platforms and six applications. Furthermore, it remains effective across a variety of fitting functions and ZNE variants.
Plug, Play, and Pray
-
Zilin Shen (Purdue University), Imtiaz Karim (The University of Texas at Dallas), Elisa Bertino (Purdue University)
The Wi-Fi Alliance has developed several device connectivity protocols—such as Wi-Fi Direct, Wi-Fi EasyConnect, and Wi-Fi EasyMesh—that are integral to billions of devices worldwide. Given their widespread adoption, ensuring the security and privacy of these protocols is critical. However, existing research has not comprehensively examined the security and privacy aspects of these protocols’ designs. To address this gap, we introduce WCDCAnalyzer (Wi-Fi Certified Device Connectivity Analyzer), a formal analysis framework designed to evaluate the security and privacy of these widely used Wi-Fi Certified Device Connectivity Protocols. One of the significant challenges in formally verifying the Wi-Fi Direct protocol is the scalability problem caused by the state explosion resulting from the protocol’s large scale and complexity, which leads to an exponential increase in memory usage. To address this challenge, we develop a systematic decomposition method following the compositional reasoning paradigm and integrate it into WCDCAnalyzer. This allows WCDCAnalyzer to automatically decompose a given protocol into several sub-protocols, verify each sub-protocol separately, and combine the results. Our design is a practical application of compositional reasoning based on rigorous foundations, and we provide detailed algorithms showing how this reasoning approach can be applied to cryptographic protocol verification. Using WCDCAnalyzer, we analyze these protocols and discover 10 vulnerabilities, including authentication bypass, privacy leakage, and DoS attacks. The vulnerabilities and associated practical attacks have been validated on commercial devices and acknowledged by the Wi-Fi Alliance.
-
Yang Yang (Singapore Management University), Guomin Yang (Singapore Management University), Yingjiu Li (University of Oregon, USA), Pengfei Wu (Singapore Management University), Rui Shi (Hainan University, China), Minming Huang (Singapore Management University), Jian Weng (Jinan University, Guangzhou, China), HweeHwa Pang (Singapore Management University), Robert H. Deng (Singapore Management University)
Service discovery is a fundamental process in wireless networks, enabling devices to find and communicate with services dynamically, and is critical for the seamless operation of modern systems like 5G and IoT. This paper introduces PriSrv+, an advanced privacy and usability-enhanced service discovery protocol for modern wireless networks and resource-constrained environments. PriSrv+ builds upon PriSrv (NDSS'24), by addressing critical limitations in expressiveness, privacy, scalability, and efficiency, while maintaining compatibility with widely-used wireless protocols such as mDNS, BLE, and Wi-Fi.
A key innovation in PriSrv+ is the development of Fast and Expressive Matchmaking Encryption (FEME), the first matchmaking encryption scheme capable of supporting expressive access control policies with an unbounded attribute universe, allowing any arbitrary string to be used as an attribute. FEME significantly enhances the flexibility of service discovery while ensuring robust message and attribute privacy. Compared to PriSrv, PriSrv+ optimizes cryptographic operations, achieving 7.62$times$ faster for encryption and 6.23$times$ faster for decryption, and dramatically reduces ciphertext sizes by 87.33$%$. In addition, PriSrv+ reduces communication costs by 87.33$%$ for service broadcast and 86.64$%$ for anonymous mutual authentication compared with PriSrv. Formal security proofs confirm the security of FEME and PriSrv+. Extensive evaluations on multiple platforms demonstrate that PriSrv+ achieves superior performance, scalability, and efficiency compared to existing state-of-the-art protocols.
-
Sumair Ijaz Hashmi (CISPA Helmholtz Center for Information Security, Saarland University), Shafay Kashif (The University of Auckland), Lea Gröber (Lahore University of Management Sciences), Katharina Krombholz (CISPA Helmholtz Center for Information Security), Mobin Javed (Lahore University of Management Sciences)
Misconfigurations in cloud services remain a leading cause of security and privacy incidents, often stemming from the complexity of configuring cloud platforms. To better understand these challenges, we analyzed approximately 251,900 security- and privacy-related Stack Overflow posts spanning from 2008 to 2024. Using topic modeling and qualitative analysis, we systematically mapped cloud use cases to their associated security and privacy configuration challenges, revealing a comprehensive landscape of the hurdles cloud operators faced. We identified both technical and human-centric issues, including problems related to insufficient documentation and the lack of context-aware tooling tailored to operators' environments. Notably, authentication and access control challenges appeared in all identified use cases, cutting across nearly every stage of cloud deployment, integration, and maintenance. Our findings underscore the need for usable, tailored, and context-sensitive support tools and resources to help developers securely configure cloud services.
-
Meenatchi Sundaram Muthu Selva Annamalai (University College London), Borja Balle (Google Deepmind), Jamie Hayes (Deepmind), Emiliano De Cristofaro (UC Riverside)
The Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm supports the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. Traditionally, DP-SGD processes training data in batches using Poisson subsampling to select each batch at every iteration. More recently, shuffling has become a common alternative due to its better compatibility and lower computational overhead. However, computing tight theoretical DP guarantees under shuffling remains an open problem. As a result, models trained with shuffling are often evaluated as if Poisson subsampling were used, which might result in incorrect privacy guarantees.
This raises a compelling research question: can we verify whether there are gaps between the theoretical DP guarantees reported by state-of-the-art models using shuffling and their actual leakage? To do so, we define novel DP-auditing procedures to analyze DP-SGD with shuffling and measure their ability to tightly estimate privacy leakage vis-`a-vis batch sizes, privacy budgets, and threat models. Overall, we demonstrate that DP models trained using this approach have considerably overestimated their privacy guarantees (by up to 4 times). However, we also find that the gap between the theoretical Poisson DP guarantees and the actual privacy leakage from shuffling is not uniform across all parameter settings and threat models. Finally, we study two common variations of the shuffling procedure that result in even further privacy leakage (up to 10 times). Overall, our work highlights the risk of using shuffling instead of Poisson subsampling in the absence of rigorous analysis methods.
Dr. Strangekeylove
-
Xiaohai Dai (Huazhong University of Science and Technology), Yiming Yu (Huazhong University of Science and Technology), Sisi Duan (Tsinghua University), Rui Hao (Wuhan University of Technology), Jiang Xiao (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology)
The emergence of blockchain technology has revitalized research interest in textit{Byzantine Fault Tolerant} (BFT) consensus, particularly asynchronous BFT due to its resilience against network attacks. To improve the performance of traditional asynchronous BFT, recent studies propose the dual-path paradigm: an optimistic path for efficiency under favorable situations and a pessimistic path—typically implemented through a textit{Multi-valued Validated Byzantine Agreement} (MVBA) protocol—to guarantee liveness in unfavorable situations.
However, owing to the inherent complexity and inefficiency of the MVBA protocol, existing dual-path protocols exhibit high implementation complexity and poor performance in unfavorable situations. Moreover, the two constituent types within the dual-path paradigm---serial-path and parallel-path---each face additional limitations. Specifically, the serial-path type encounters difficulties in switching between the optimistic and pessimistic paths, whereas the parallel-path type discards blocks from one of the paths, resulting in bandwidth waste and reduced throughput.To address these limitations, we propose Icarus, a single-path asynchronous BFT protocol that exclusively leverages optimistic paths without pessimistic paths. The optimistic path ensures Icarus's efficiency under favorable situations. To guarantee liveness in unfavorable conditions, Icarus employs a rotating-chain mechanism: each node broadcasts a chain of blocks in parallel, and these chains take turns serving as the optimistic path in a round-robin fashion.
Since non-faulty nodes' chains continuously grow, once a chain accumulating enough blocks becomes the optimistic path, its blocks can be committed, ensuring liveness even in unfavorable conditions.
To maintain consistency during path transitions, Icarus introduces the textit{Two-consecutive-validated-value Byzantine Agreement} (tcv$^2$-BA) protocol, which aligns heights of committed blocks on the previous path.
We have verified Icarus's correctness through theoretical analysis and validated its high performance through various experiments. -
Harjasleen Malvai (University of Illinois, Urbana-Champaign), Francesca Falzon (ETH Zürich), Andrew Zitek-Estrada (EPFL), Sarah Meiklejohn (University College London), Joseph Bonneau (NYU)
We systematize the research on authenticated dictionaries (ADs)---cryptographic data structures that enable applications such as key transparency, binary transparency, verifiable key-value stores, and integrity-preserving filesystems. First, we present a unified framework that captures the trust and threat assumptions behind five common deployment scenarios. Second, we distill and reconcile the diverse security definitions scattered across the literature, clarifying the guarantees they offer and when each is appropriate. Third, we develop a taxonomy of AD constructions and analyze their asymptotic costs, exposing a sharp dichotomy: every known scheme either incurs $mathcal{O}(log n)$ time for both lookups and updates, or achieves $mathcal{O}(1)$ for one operation only by paying $mathcal{O}(n)$ for the other. Surprisingly, this barrier persists even when stronger trust assumptions are introduced, undermining the intuition that "more trust buys efficiency''. We conclude with application-driven research questions, including realistic auditing models and incentives for adoption in systems that today provide no verifiable integrity at all.
-
Yusuke Kubo (NTT DOCOMO BUSINESS, Inc. / Waseda University), Fumihiro Kanei (NTT DOCOMO BUSINESS, Inc.), Mitsuaki Akiyama (NTT, Inc.), Takuro Wakai (Waseda University), Tatsuya Mori (Waseda University / NICT / RIKEN AIP)
GitHub Actions has become a dominant Continuous Integration/Continuous Delivery (CI/CD) platform, yet recent supply chain attacks like SolarWinds and tj-actions/changed-files highlight critical security vulnerabilities in such systems. While GitHub provides official security practices to mitigate these risks, the extent of their real-world implementation remains unknown. We present a mixed-methods study analyzing 338,812 public repositories and surveying over 100 developers to understand security practice implementation in GitHub Actions. Our findings reveal alarmingly low implementation rates across five key security practices, ranging from 0.6% to 52.9%. We identify three primary barriers: lack of awareness (up to 71.6% of non-adopters were unaware of practices), misconceptions about applicability, and concerns about operational costs. Repository characteristics such as organization ownership and recent development activity significantly correlate with better security practice implementation. Based on these empirical insights, we derive actionable recommendations that align intervention strategies with appropriate levels of automation, improve notification design to support awareness, strengthen platform- and IDE-level assistance, and clarify documentation on risks and applicability.
-
Alan T. Sherman (University of Maryland, Baltimore County (UMBC)), Jeremy J. Romanik Romano (University of Maryland, Baltimore County (UMBC)), Edward Zieglar (University of Maryland, Baltimore County (UMBC)), Enis Golaszewski (University of Maryland, Baltimore County (UMBC)), Jonathan D. Fuchs (University of Maryland, Baltimore County (UMBC)), William E. Byrd (University of Alabama at Birmingham)
We analyze security aspects of the SecureDNA system with regard to its system design, engineering, and implementation. This system enables DNA synthesizers to screen order requests against a database of hazards. By applying novel cryptography involving distributed oblivious pseudorandom functions, the system aims to keep order requests and the database of hazards secret. Discerning the detailed operation of the system in part from source code (Version 1.0.8), our analysis examines key management, certificate infrastructure, authentication, and rate-limiting mechanisms. We also perform the first formal-methods analysis of the mutual authentication, basic request, and exemption-handling protocols.
Without breaking the cryptography, our main finding is that SecureDNA's custom mutual authentication protocol SCEP achieves only one-way authentication: the hazards database and keyservers never learn with whom they communicate. This structural weakness violates the principle of defense in depth and enables an adversary to circumvent rate limits that protect the secrecy of the hazards database, if the synthesizer connects with a malicious or corrupted keyserver or hashed database. We point out an additional structural weakness that also violates the principle of defense in depth: inadequate cryptographic bindings prevent the system from detecting if responses, within a TLS channel, from the hazards database were modified. Consequently, if a synthesizer were to reconnect with the database over the same TLS session, an adversary could replay and swap responses from the database without breaking TLS. Although the SecureDNA implementation does not allow such reconnections, it would be stronger security engineering to avoid the underlying structural weakness. We identify these vulnerabilities and suggest and verify mitigations, including adding strong bindings.
Our work illustrates that a secure system needs more than sound mathematical cryptography; it also requires formal specifications, sound key management, proper binding of protocol message components, and careful attention to engineering and implementation details.
Game of Flows
-
Yingqian Hao (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Hui Zou (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Lu Zhou (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Yuxuan Chen (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Yanbiao Li (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences)
The Border Gateway Protocol (BGP) lacks inherent security, leaving the Internet vulnerable to severe threats like route leaks. Existing detection methods suffer from limitations such as rigid binary classification, high false positives, and sparse authoritative AS relationship data. To address these challenges, this paper proposes PathProb—a novel paradigm that flexibly identifies route leaks by calculating topology-aware probability distributions for AS links and computing legitimacy scores for AS paths. Our approach integrates Monte Carlo methods with an Integer Linear Programming formulation of routing policies to derive these solutions efficiently.
We comprehensively evaluate PathProb using real-world BGP routing traces and route leak incidents. Results show our inference model outperforms state-of-the-art approaches with a high-confidence validation dataset. PathProb detects real-world route leaks with $98.45%$ recall while simultaneously reducing false positives by $4.29sim 20.08$ percentage points over state-of-the-art alternatives. Additionally, PathProb’s path legitimacy scoring enables network administrators to dynamically adjust route leak detection thresholds—tailoring security posture to their specific false alarm tolerance and security needs. Finally, PathProb offers seamless compatibility with emerging route leak mitigation mechanisms, such as Autonomous System Provider Authorization (ASPA), enabling flexible integration to enhance leak detection capabilities.
-
Weitong Li (Virginia Tech), Tao Wan (CableLabs), Tijay Chung (Virginia Tech)
The Resource Public Key Infrastructure (RPKI) enhances Internet routing security by utilizing Route Origin Authorization (ROA) objects to link IP prefixes with their rightful origin ASNs. Despite the rapid deployment of RPKI---over 51.3% of Internet routes now covered by ROAs, there are still 6,802 RPKI-invalid prefixes as of today. This work provides the first comprehensive study to understand and classify the hidden causes of RPKI-invalid prefixes, revealing that ROA misconfigurations often occur during IP leasing and IP transit services. We identify scenarios explaining these misconfigurations and attribute 96.9% of the RPKI-invalid prefixes to such misconfigurations.
We further show their cascading impacts on the data plane, noting that while most prefixes exhibit negligible effects, 3.1% result in full connectivity loss and 7.1% degrade routing by adding latency and extra hop counts---and, in some cases, bypassing intended security mechanisms; additionally, we find that such misconfigurations have been triggering false alarms in hijack detection systems. To validate our findings, we build a ground-truth dataset of 294 misconfigured prefixes through direct engagement with 174 network operators. We also interviewed 16 large ISPs and major leasing brokers about their ROA management practices, and we propose suggestions to avert ROA misconfigurations.
Taken together, this study not only fills gaps left by previous research but also offers actionable recommendations to network operators for improving ROA management and minimizing the occurrence of RPKI-invalid announcements.
-
Wei Shao (University of California, Davis), Zequan Liang (University of California Davis), Ruoyu Zhang (University of California, Davis), Ruijie Fang (University of California, Davis), Ning Miao (University of California, Davis), Ehsan Kourkchi (University of California - Davis), Setareh Rafatirad (University of California, Davis), Houman Homayoun (University of California Davis), Chongzhou Fang (Rochester Institute of Technology)
Biometric authentication using physiological signals offers a promising path toward secure and user-friendly access control in wearable devices. While electrocardiogram (ECG) signals have shown high discriminability, their intrusive sensing requirements and discontinuous acquisition limit practicality. Photoplethysmography (PPG), on the other hand, enables continuous, non-intrusive authentication with seamless integration into wrist-worn wearable devices. However, most prior work relies on high-frequency PPG (e.g., 75--500,Hz) and complex deep models, which incur significant energy and computational overhead—impeding deployment in power-constrained real-world systems.
In this paper, we present the first real-world implementation and evaluation of a continuous authentication system on a smartwatch, We-Be Band, using low-frequency (25,Hz) multi-channel PPG signals. Our method employs a Bi-LSTM with attention mechanism to extract identity-specific features from short (4,s) windows of 4-channel PPG. Through extensive evaluations on both public datasets (PTTPPG) and our We-Be Dataset (26 subjects), we demonstrate strong classification performance with an average test accuracy of 88.11%, macro F1-score of 0.88, False Acceptance Rate (FAR) of 0.48%, False Rejection Rate (FRR) of 11.77%, and Equal Error Rate (EER) of 2.76%. Our 25,Hz system reduces sensor power consumption by 53% compared to 512,Hz and 19% compared to 128,Hz setups without compromising performance. We find that sampling at 25,Hz preserves authentication accuracy, whereas performance drops sharply at 20,Hz while offering only trivial additional power savings, underscoring 25,Hz as the practical lower bound. Additionally, we find that models trained exclusively on resting data fail under motion, while activity-diverse training improves robustness across physiological states. -
Sri Hrushikesh Varma Bhupathiraju (University of Florida), Shaoyuan Xie (University of California, Irvine), Michael Clifford (Toyota InfoTech Labs), Qi Alfred Chen (University of California, Irvine), Takeshi Sugawara (The University of Electro-Communications), Sara Rampazzi (University of Florida)
Thermal cameras are increasingly considered a viable solution in autonomous systems to ensure perception in low-visibility conditions. Specialized optics and advanced signal processing are integrated into thermal-based perception pipelines of self-driving cars, robots, and drones to capture relative temperature changes and allow the detection of living beings and objects where conventional visible-light cameras struggle, such as during nighttime, fog, or heavy rain. However, it remains unclear whether the security and trustworthiness of thermal-based perception systems are comparable to those of conventional cameras. Our research exposes and mitigates three novel vulnerabilities in thermal image processing, specifically within equalization, calibration, and lensing mechanisms, that are inherent to thermal cameras. These vulnerabilities can be triggered by heat sources naturally present or maliciously placed in the environment, altering the perceived relative temperature, or generating time-controlled artifacts that can undermine the correct functioning of obstacle avoidance.
We systematically analyze vulnerabilities across three thermal cameras used in autonomous systems (FLIR Boson, InfiRay T2S, FPV XK-C130), assessing their impact on three fine-tuned thermal object detectors and two visible-thermal fusion models for autonomous driving.
Our results show a mean average precision drop of 50% in pedestrian detection and 45% in fusion models, caused by flaws in the equalization process. Real-world driving tests at speeds up to 40 km/h show pedestrian misdetection rates up to 100% and the creation of false obstacles with a 91% success rate, persisting minutes after the attack ends. To address these issues, we propose and evaluate three novel threat-aware signal processing algorithms that dynamically detect and suppress attacker-induced artifacts. Our findings shed light on the reliability of thermal-based perception processes, to raise awareness of the limitations of such technology when used for obstacle avoidance.
Death by a Thousand Abstractions
-
The Dark Side of Flexibility: Detecting Risky Permission Chaining Attacks in Serverless Applications
Xunqi Liu (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Nanzi Yang (University of Minnesota), Chang Li (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Jinku Li (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Jianfeng Ma (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Kangjie Lu (University of Minnesota)
Modern serverless platforms enable rapid application evolution by decoupling infrastructure from function-level development. However, this flexibility introduces a fundamental mismatch between the decentralized, function-level privilege configurations of serverless applications and the centralized cloud access control systems. We observe that this mismatch commonly incurs risky permissions of functions in serverless applications, and an attacker can chain multiple risky-permissioned functions to escalate privileges, take over the account, and even move laterally to compromise other accounts. We term such an attack a emph{risky permission chaining attack}.
In this work, we propose an automated reasoning system that can detect risky permissions that are exploitable for chaining attacks. First, we root in attacker-centric modality abstraction, which explicitly captures how independent permissions from different functions and accounts can be merged into real attack chains. Based on this abstraction, we build a modality-guided detection tool that uncovers exploitable privilege chains in real-world serverless applications. We evaluate our approach across two major cloud platforms --- AWS and Alibaba Cloud --- by analyzing serverless applications sourced from their official, production-grade application repositories. As a result, our analysis uncovers 28 vulnerable applications, including five confirmed CVEs, six responsible vulnerability acknowledgments, and one security bounty. These findings underscore that the risky permission chaining attack is not only a theoretical risk but also a structural and exploitable threat already present in commercial serverless deployments, rooted in the fundamental mismatch between decentralized serverless applications and centralized access control models.
-
Andong Chen (Zhejiang University), Ziyi Guo (Northwestern University), Zhaoxuan Jin (Northwestern University), Zhenyuan Li (Zhejiang University), Yan Chen (Northwestern University)
Kubernetes Operators, automated tools designed to manage application lifecycles within Kubernetes clusters, extend the functionalities of Kubernetes, and reduce the operational burden on human engineers. While Operators significantly simplify DevOps workflows, they introduce new security risks. In particular, Kubernetes enforces namespace isolation to separate workloads and limit user access, ensuring that users can only interact with resources within their authorized namespaces. However, Kubernetes Operators often demand elevated privileges and may interact with resources across multiple namespaces. This introduces a new class of vulnerabilities, the Cross-Namespace Reference Vulnerability. The root cause lies in the mismatch between the declared scope of resources and the implemented scope of the Operator’s logic, resulting in Kubernetes being unable to properly isolate the namespace. Leveraging such vulnerability, an adversary with limited access to a single authorized namespace may exploit the Operator to perform operations affecting other unauthorized namespaces, causing Privilege Escalation and further impacts.
To the best of our knowledge, this paper is the first to systematically investigate Kubernetes Operator attacks. We present Cross-Namespace Reference Vulnerability with two strategies, demonstrating how an attacker can bypass namespace isolation. Through large-scale measurements, we found that over 14% of Operators in the wild are potentially vulnerable.
Our findings have been reported to the relevant developers, resulting in 8 confirmations and 7 CVEs by the time of submission, affecting vendors including the inventor of Kubernetes - Google and the inventor of Operator - Red Hat, highlighting the critical need for enhanced security practices in Kubernetes Operators. To mitigate it, we open-source the static analysis suite and propose concrete mitigation to benefit the ecosystem. -
Qi Wang (Tsinghua University), Jianjun Chen (Tsinghua University), Jingcheng Yang (Tsinghua University), Jiahe Zhang (Tsinghua University), Yaru Yang (Tsinghua University), Haixin Duan (Tsinghua University)
Session Initiation Protocol (SIP) is a cornerstone of modern real-time communication systems, powering voice calls, text messaging, and multimedia sessions across services such as VoIP, VoLTE, and RCS. While SIP provides mechanisms for authentication and identity assertion, its inherent flexibility poses the risk of semantic ambiguity among implementations that can be exploited by attackers.
In this paper, we present SIPChimera, a novel black-box fuzzing framework designed to systematically identify ambiguity-based identity spoofing vulnerabilities across SIP implementations. We evaluated SIPChimera against six widely used open-source SIP servers—including Asterisk and OpenSIPS—and nine popular user agents, uncovering that attackers could spoof their identity via manipulating identity headers and circumvent authentication. We demonstrate the real-world impact of these vulnerabilities by evaluating five VoIP devices, seven commercial SIP deployments, and three carrier-grade RCS-based SMS platforms. Our experiments show that attackers can exploit these vulnerabilities to perform caller ID spoofing in VoIP calls and send spoofed SMS messages over RCS, impersonating arbitrary users or services. We have responsibly disclosed our findings to affected vendors and received positive acknowledgments. We finally propose remedies to mitigate those issues.
-
Xinshu Ma (University of Edinburgh), Michio Honda (University of Edinburgh)
Quantum computers threaten to break the cryptographic foundations of classical TLS, prompting a shift to post-quantum cryptography. However, post-quantum authentication imposes significant performance overheads, particularly for mutual TLS in cloud environments with high handshake rates. We present Looma, a fast post-quantum authentication architecture that splits authentication into a fast, on-path sign/verify operation and slow, off-path pre-computations performed asynchronously, reducing handshake latency without sacrificing security. Integrated into TLS 1.3, Looma lowers PQTLS handshake latency by up to 44% compared to a Dilithium-2--based baseline. Our results demonstrate the practicality of Looma for scaling post-quantum secure communications in cloud environments.
Reverse, Reverse
-
Andrea Monzani (University of Milan), Antonio Parata (University of Milan), Andrea Oliveri (EURECOM), Simone Aonzo (EURECOM), Davide Balzarotti (EURECOM), Andrea Lanzi (University of Milan)
Bring Your Own Vulnerable Driver (BYOVD) attacks abuse legitimate, digitally signed Windows drivers that contain hidden flaws, allowing adversaries to slip into kernel space, disable security controls, and sustain stealthy campaigns ranging from ransomware to state-sponsored espionage. Because most public sandboxes inspect only user-mode activity, this kernel-level abuse typically flies under the radar. In this work, we first introduce the first dynamic taxonomy of BYOVD behavior. Synthesized from manual investigation of real-world incidents and fine-grained kernel-trace analysis, it maps every attack to sequential stages and enumerates the key APIs abused at each step. Then, we propose a virtualization-based sandbox that follows every step of a driver's execution path, from the originating user-mode request down to the lowest-level kernel instructions, without requiring driver re-signing or host modifications. Finally, the sandbox automatically annotates every observed action with its corresponding taxonomy, producing a stage-by-stage report that highlights where and how a sample exhibits suspicious behavior. Tested against the current landscape of BYOVD techniques, we analyzed 8,779 malware samples that load 773 distinct signed drivers. It flagged suspicious behavior in 48 drivers, and subsequent manual verification led to the responsible disclosure of seven previously unknown vulnerable drivers to Microsoft, their vendors, and public threat-intelligence platforms. Our results demonstrate that deep, transparent tracing of kernel control flow can expose BYOVD abuse that eludes traditional analysis pipelines, enriching the community's knowledge of driver exploitation and enabling proactive hardening of Windows defenses.
-
Hanqing Zhao (Tsinghua University & QI-ANXIN Technology Research Institute), Yiming Zhang (Tsinghua University), Lingyun Ying (QI-ANXIN Technology Research Institute), Mingming Zhang (Zhongguancun Laboratory), Baojun Liu (Tsinghua University), Haixin Duan (Tsinghua University), Zi-Quan You (Tsinghua University), Shuhao Zhang (QI-ANXIN Technology Research Institute)
Using digital certificates to sign software is an important protection for its trustworthiness and integrity. However, attackers can abuse the mechanism to obtain signatures for malicious samples, aiding malware distribution. Despite existing work uncovering instances of code-signing abuse, the problem persists and continues to escalate. Understanding the evolution of the ecosystem and the strategies of abusers is vital to improving defense mechanisms.
In this work, we conducted a large-scale measurement of code-signing abuse using 3,216,113 signed malicious PE files collected from the wild.
Through fine-grained classification, we identified 43,286 abused certificates and categorized them into five abuse types, creating the largest labeled dataset to date. Our analysis revealed that abuse remains widespread, affecting certificates from 114 countries issued by 46 Certificate Authorities (CAs). We also observed the evolution of abuser techniques and identified current limitations in certificate revocation. Furthermore, we characterized abusers' behaviors and strategies, uncovering five tactics to evade detection, reduce costs and enhance abusing impact. Notably, we uncovered 3,484 polymorphic certificate clusters and, for the first time, documented real-world instances of malware leveraging polymorphism to evade revocation checks. Our findings expose critical flaws in current code-signing practices, and are expected to raise community awareness of the abuse threats. -
Zezhong Ren (University of Chinese Academy of Sciences; EPFL), Han Zheng (EPFL), Zhiyao Feng (EPFL), Qinying Wang (EPFL), Marcel Busch (EPFL), Yuqing Zhang (University of Chinese Academy of Sciences), Chao Zhang (Tsinghua University), Mathias Payer (EPFL)
Kernel fuzzing effectively uncovers vulnerabilities. While existing kernel fuzzers primarily focus on maximizing code coverage, coverage alone does not guarantee thorough exploration. Moreover, existing fuzzers, aimed at maximizing coverage, have plateaued. This pressing situation highlights the need for a new direction: code frequency-oriented kernel fuzzing. However, increasing the exploration of low-frequency kernel code faces two key challenges: (1) Resource constraints make it hard to schedule sufficient tasks for low-frequency regions without causing task explosion. (2) Random mutations often break context dependencies of syscalls targeting low-frequency regions, reducing the effectiveness of fuzzing.
In our paper, we first perform a fine-grained study of imbalanced code coverage by evaluating Syzkaller in the Linux kernel and, as a response, propose SYSYPHUZZ, a kernel fuzzer designed to boost exploration of under-tested code regions. SYSYPHUZZ introduces Selective Task Scheduling to dynamically prioritize and manage exploration tasks, avoiding task explosion. It also employs Context Preserving Mutation strategy to reduce the risk of disrupting important execution contexts. We evaluate SYSYPHUZZ against the state-of the-art (SOTA) kernel fuzzers, Syzkaller and SyzGPT. Our results show that SYSYPHUZZ significantly reduces the number of under-explored code regions and discovers 31 unique bugs missed by Syzkaller and 27 bugs missed by SyzGPT. Moreover, SYSYPHUZZ finds five bugs missed by Syzbot, which continuously runs on hundreds of virtual machines, demonstrating SYSYPHUZZ’s effectiveness. To evaluate SYSYPHUZZ’s enhancement to SOTA fuzzers, we integrate it with SyzGPT, yielding SyzGPTsysy, which finds 33% more exclusive bugs, highlighting SYSYPHUZZ’ potential. All discovered vulnerabilities have been responsibly disclosed to the Linux maintainers. We release the source code of SYSYPHUZZ at https://github.com/HexHive/Sysyphuzz and are trying to upstream it to Syzkaller.
-
Tillson Galloway (Georgia Institute of Technology), Omar Alrawi (Georgia Institute of Technology), Allen Chang (Georgia Institute of Technology), Athanasios Avgetidis (Georgia Institute of Technology), Manos Antonakakis (Georgia Institute of Technology), Fabian Monrose (Georgia Institute of Technology)
Despite the billions of dollars invested in the threat intelligence (TI) ecosystem---a globally distributed network of security vendors and altruists who drive critical cybersecurity operations---we lack an understanding of how it functions, including its dynamics and vulnerabilities. To fill that void, we propose a novel measurement framework that tracks binaries as they traverse the ecosystem by monitoring for watermarked network Indicators of Compromise (IoCs). By analyzing each stage of the propagation chain of submitted TI (submission, extraction, sharing, and disruption), we uncover an ecosystem where dissemination almost always leads to the disruption of threats, but vendors who selectively share the TI they extract limit the ecosystem's utility. Further, we find that attempts to curtail threats are often slowed by `bottleneck' vendors delaying the sharing of TI by hours to days.
Critically, we identify several threats to the ecosystem's supply chain, some of which are presently exploited in the wild. Unnecessary active probing by vendors, shallow extraction of dropped files, and easy-to-predict sandbox environment fingerprints all threaten the health of the ecosystem. To address these issues, we provide actionable recommendations for vendors and practitioners to improve the safety of the TI supply chain, including detection signatures for known abuse patterns. We collaborated with vendors through a responsible disclosure process, gaining insight into the operational constraints underlying these weaknesses. Finally, we provide a set of ethical best practices for researchers actively measuring the threat intelligence ecosystem.