Friday, 27 February

  • 08:45 - 09:00
    Welcome and Introductory Remarks
    Coast Ballroom
  • 09:00 - 10:00
    Keynote 1 by Perri Adams
    Coast Ballroom
    • Perri Adams, Dartmouth College ISTS Fellow & John Hopkins SAIS Adjunct Professor

      Speaker's Biography: Perri Adams is a fellow at Dartmouth’s Institute for Security Technology Studies and a former Special Assistant to the Director at DARPA, where she advised on next-generation AI and cybersecurity technologies across the U.S. government. She previously served as a DARPA Program Manager in the Information Innovation Office, where she created the AI Cyber Challenge. Ms. Adams is a frequent speaker and published author on cyber and technology policy, an adjunct professor at Johns Hopkins SAIS, and a former organizer of the DEFCON Capture the Flag competition.

  • 10:00 - 10:30
    Morning Break
    Pacific Ballroom D
  • 10:30 - 12:10
    Session 1: Advancing Binary Analysis
    Coast Ballroom
    • Junpeng Wan, Louis Zheng-Hua Tan, Dave (Jing) Tian (Purdue University)

      NVIDIA GPUs underpin the vast majority of modern AI workloads. These workloads are ultimately executed in the form of Streaming Assembly (SASS), the lowest-level assembly for NVIDIA hardware. However, SASS remains largely undocumented, let alone well studied, posing a significant barrier to downstream security applications, such as security auditing, vulnerability discovery, binary hardening, etc.

      In this paper, we address this challenge with NVLift, a systematic framework that lifts NVIDIA GPU SASS into LLVM IR to enable downstream GPU binary analysis. To lift SASS instructions, NVLift reconstructs instruction semantics by consolidating prior reverse-engineering efforts and validating execution behaviors at runtime using cuda-gdb. To verify the semantic correctness of the lifted IR, we design and implement a differential testing pipeline by compiling the lifted IR into SASS and comparing the GPU execution results against the SASS generated from the reference CUDA kernel compilation. In total, NVLift supports 47 commonly used SASS instructions on the Turing architecture (SM75), covering 88.39% of instruction occurrence count in popular CUDA libraries. Using NVLift, we lifted 11 CUDA kernels, including representative DNN operators, and verified the semantic correctness of 5 kernels. We further provide a PoC implementation of GPU binary decompilation by translating the lifted LLVM IR into pseudo C code using RetDec. In sum, NVLift is a critical step towards enabling GPU binary analysis and downstream security applications.

    • Charles Averill, Ilan Buzzetti (The University of Texas at Dallas), Alex Bellon (UC San Diego), Kevin Hamlen (The University of Texas at Dallas)

      LAPSE is a new framework for developing faulttolerant correctness proofs for near-arbitrary native code. It lifts binary code into an intermediate representation (IR) whose operational semantics admit hardware faults. LAPSE implements a machine-verified symbolic execution engine for the resulting IR within the Rocq automated theorem proving framework, creating a proof environment in which the space of possible executions includes all potential fault possibilities. To cope with the increase in proof space, automation tools succinctly describe and reason about the desired fault model. An implementation for 32-bit RISC-V semantics and evaluation on security-critical cryptographic subroutines from OpenSSL and BearSSL demonstrates that fault-aware proofs can be constructed from standard correctness proofs with little additional work, often requiring no novel proof techniques. The results show that developing fault-tolerant correctness proofs is not only feasible, but rote for certain kinds of fault-tolerant programs.

    • Henny Sipma, Ricardo Baratto, Ben Karel, Michael Gordon (Aarno Labs)

      When source code is unavailable, patching security vulnerabilities in binaries requires scarce reverse engineering expertise and specialized tooling. We present Dilipa, a binary micropatching system that enables users to specify patches as edits to lifted C code. Dilipa operates on an AST-based intermediate representation enriched with provenance metadata linking high-level constructs to underlying binary instructions, registers, and memory locations. A frontend compares the original and edited ASTs to extract minimal patch descriptions, and a backend applies them to the binary via direct instruction replacement or trampolines. By focusing on micropatches, small and localized modifications, our approach keeps binary changes minimal and enables post-patch validation through relational binary analysis, providing evidence that no unintended semantic changes have been introduced. We demonstrate Dilipa on three case studies involving real embedded systems, including input validation, buffer overflow, and race condition bugs.

    • Bokai Zhang, Monika Santra, Syed Rafiul Hussain, Gang Tan (Pennsylvania State University)

      Sound indirect-call resolution for stripped binaries is critical for security applications such as CFI enforcement, debloating, and large-scale vulnerability discovery, yet it remains challenging in the absence of symbol and type information. A recent work, Block-Based Points-to Analysis (BPA) addresses this problem with a scalable block memory model, but its implementation is tightly coupled to 32-bit x86 through an ISA-specific disassembly pipeline.

      To overcome this limitation, we present BPA-X, an architecture-agnostic block-based points-to analysis framework for stripped binaries across multiple ISAs. BPA-X preserves the core soundness assumptions of BPA’s block memory model while replacing x86-specific components with an architecture-agnostic VEX IR via binary analysis platform angr. It generalizes local and global memory-block partitioning using VEX semantics instead of x86-specific patterns, lifts VEX IR into SSA form, and performs fixpoint computation on interprocedural value tracking and reachability analysis.

      Our evaluation on SPEC CPU 2006 and real-world server binaries shows that BPA-X improves memory-block partitioning, reduces AICT on many x86 programs compared to BPA, and extends the analysis to x64 without degrading much precision. BPA-X also reduces memory consumption by 25% and improves runtime on large benchmarks.

    • Tomás Pelayo-Benedet (Universidad de Zaragoza), Kevin Borgolte (Ruhr University Bochum), Ricardo J. Rodríguez (Universidad de Zaragoza)

      Binary decompilation remains an open challenge in reverse engineering. While recent approaches have begun to leverage the capabilities of large language models (LLMs), most continue to focus exclusively on disassembly as input, ignoring the intermediate representations (IRs) employed by static binary analysis tools and traditional decompilers.

      In this paper, we present the first systematic evaluation of LLM-based decompilation using hierarchical IRs. In particular, we investigate how different levels of abstraction in IRs affect binary decompilation quality in five commercial LLMs. Our findings show that the choice of IR significantly influences performance: Smaller models benefit markedly from high-level structured IRs, while larger models show stable performance across IR levels. Our evaluation also reveals a significant trade-off between recompilation success and functional correctness. Code decompiled from disassembly tends to recompile more reliably, but it is less often functionally correct. In contrast, code decompiled from high-level IRs more often retains the original functionality, albeit with slightly lower recompilation success rates. Furthermore, we find that cognitive complexity metrics, such as Halstead measures, are strong predictors of decompilation difficulty, while traditional structural metrics, such as cyclomatic complexity, offer limited insight. We also highlight the main lines of research to improve binary decompilation by combining the advantages of static binary analysis techniques with the capabilities of modern LLMs.

  • 12:00 - 13:30
    Lunch
    Loma Vista Terrace and Harborside
  • 13:30 - 14:30
    Keynote 2 by Marion Marschalek
    Coast Ballroom
    • Marion Marschalek, Hack & Cheese Security Consulting

      Speaker's Biography: Marion Marschalek is an independent security consultant and trainer with her consulting company Hack & Cheese. Prior to that she held senior positions at AWS and Intel, and different roles in the threat detection industry, as a malware reverse engineer and incident responder. Marschalek is a frequent speaker at major security conferences, including Black Hat, Defcon, HITB, RSA, and SyScan, among others. She used to teach reverse engineering classes at University of Applied Sciences St. Poelten, from where she graduated in 2011 with a Master's Degree in Information Security.

      In 2015, she started a hacker bootcamp for women titled BlackHoodie, which established itself as a global initiative to attract more diverse talent to the security industry. In her spare time, she enjoys long-distance running.

  • 14:30 - 15:00
    Afternoon Break
    Pacific Ballroom D
  • 15:00 - 16:40
    Session 2: Applying Binary Analysis
    Coast Ballroom
    • Kevan Baker, Daniel R. Tauritz, Samuel Mulder (Auburn University)

      Binary analysis tools work better together. In the case of static analysis, symbolic execution tools are used to explore possible execution paths in a binary and decompilers are used to view binary code. In this paper, we discuss the bridging of these two types of tools, using state-of-the-art tools Binary Ninja and angr. We present a work-in-progress plugin for Binary Ninja named Bangr which integrates features of angr. With our plugin, we demonstrate how coupling angr and Binary Ninja enables answering questions that Binary Ninja cannot answer on its own. We further demonstrate the utility of having a graphical interface for angr, and conclude with a discussion on the Bangr plugin.

    • Daniel Huici, Ricardo J. Rodríguez (University of Zaragoza), Andrei Costin (University of Jyvaskyla), Narges Yousefnezhad (Binare Oy)

      Tracking N-day vulnerabilities in fragmented firmware ecosystems is an open challenge, often hampered by the disconnect between abstract CVE descriptions and the binary code actually distributed in production and connected devices. In this paper, we present a generic CVE-based framework for correlating vulnerable files in heterogeneous firmware images using similarity digests. Our approach leverages APOTHEOSIS, an open-source approximate nearest neighbor search system, to scale similarity queries across massive collections of artifacts. To bridge the semantic gap between vulnerability reports and binary reality, we introduce an automated process that lifts confirmed vulnerable implementations to high-level intermediate representations and generates function-level search signatures. We demonstrate the effectiveness of this system as a rapid triage tool using the OPENWRT ecosystem as a case study. In the event of a new CVE disclosure, our approach allows analysts to consult the pre-created APOTHEOSIS index to immediately generate a prioritized list of affected firmware versions, significantly accelerating impact assessment without being dependent on reliable nor accurate vendor/CVE metadata or source code.

    • Abraham Clements, Abel Gomez Rivera (Sandia National Laboratories), Richard Jiayang Liu, Kirill Levchenko (University of Illinois Urbana-Champaign), Rick Kennell (Purdue University), Gabriela Ciocarlie (The Cybersecurity Manufacturing Innovation Institute and Stevens Institute of Technology) 

      Embedded systems are integral to modern society and are increasingly being attacked, necessitating improved techniques to identifying and mitigating vulnerabilities. Fuzzing has proven to be a useful technique for identifying vulnerabilities. Nevertheless, the complexity of embedded systems using realtime operating systems (RTOSes) has limited the ability to even observe their execution, much less effectively fuzz them. Rehosting these systems’ firmware in an emulator has emerged as a technique to solve challenges with inspectability and parallelizing fuzzing, but challenges remain for complex RTOS-based systems. We present RT-Fuzzer, a technique that leverages the modularization of RTOS-based embedded systems into tasks to simplify re-hosting and enable effective feedback-directed fuzzing of complex embedded systems. RT-Fuzzer creates a custom initialization for the RTOS and core services in the emulator and then starts only the target task(s) for fuzzing. This simplifies rehosting and enables the fuzzing effort to be focused on a selected task. We illustrate this technique on an open source RTOS and a commercial PLC discovering and reporting vulnerabilities in both.

    • Ryutaro Nishizaka, Yudai Fujiwara, Takuya Shimizu, Kazushi Kato, Yuichi Sugiyama (Ricerca Security, Inc.)

      LLM agents that autonomously operate tools such as disassemblers and debuggers are increasingly used for reverse engineering. Designing LLM-resistant protections requires understanding their capability characteristics, yet prior work has not studied this systematically. We propose an analytical model linking a three-stage loop (Observe–Comprehend–Plan) to three categories of software protection (Concealment–Complication– Misdirection) and evaluate three LLM agents on 24 CTF reverse engineering tasks. By analyzing failure logs, we identify four weaknesses (Training bias, Over-trust in observations, Context limitation, Plan persistence) and show that different software protections disrupt different stages and expose different weaknesses. We also find that LLM agents often analyze assembly effectively without a decompiler, and that their strengths differ from human solvers depending on challenge characteristics.

    • Michael Kadoshnikov, Clemente Izurieta, Matthew Revelle (Montana State University)

      Program graphs have become essential for vulnerability detection on program binaries, particularly for approaches based on machine learning. However, many researchers focus on comparing the performance of their technique with others, often neglecting the rationale behind the chosen graph structure used in their approach. This paper explores the comparative performance of various program graphs, such as abstract syntax trees (ASTs), control flow graphs (CFGs), data dependence graphs (DDGs), and their combinations. Each graph variation is evaluated by measuring the classification performance of representation-specific graph neural networks in detecting vulnerabilities at the program level in compiled programs from the NIST SARD Juliet dataset. By evaluating each combination’s strengths and weaknesses, we identify the most effective graph structure for binary vulnerability detection. Performance is evaluated across all variations through a statistical analysis of the experimental results.

  • 16:40 - 17:00
    Closing Remarks
    Coast Ballroom