P2C: Understanding Output Data Files via On-the-Fly Transformation from Producer to Consumer Executions

Author(s): Yonghwi Kwon, Fei Peng, Dohyeong Kim, Kyungtae Kim, Xiangyu Zhang, Dongyan Xu, Vinod Yegneswaran, John Qian

Download: Paper (PDF)

Date: 7 Feb 2015

Document Type: Briefing Papers

Additional Documents: Slides

Associated Event: NDSS Symposium 2015

Abstract:

In cyber attack analysis, it is often highly desirable to understand the meaning of an unknown file or a network message in the absence of their consumer (i.e. the program that parses and understands the file/message). For example, a malware may stealthily collect information from a victim machine, store them as a file and later send it to a remote server. P2C is a novel technique that can parse and understand unknown files and network messages. Given a file/message that was generated in the past without the presence of any monitoring techniques, and a set of potential producers of the file/message, P2C systematically explores the execution paths in the producers without requiring any inputs. In the mean time, it tries to transform a producer execution to a consumer execution that closely resembles the ideal consumer execution that can parse the given unknown file/message. In particular, when a write operation is encountered in the original execution, P2C performs the opposite read operation on the unknown file/message and patches the original execution with the loaded value. In order to handle correlations between data fields in the file/message, P2C follows a trial-and-error approach to look for the correct transformation until the file/message can be parsed and the meaning of their fields can be disclosed. Our experiments on a set of real world applications demonstrate P2C is highly effective.