The Nuts and Bolts of Building FlowLens

Diogo Barradas (Instituto Superior Técnico, Universidade de Lisboa)

The advent of programmable switches has sparked a general interest in devising new security solutions for high-speed networks. Recently, we introduced FlowLens, a system that leverages programmable switches to efficiently support multi-purpose security network applications based on machine learning algorithms. With FlowLens, network operators are able to program their switches to automatically scan and classify flows with high accuracy for a wide range of scenarios, such as multimedia covert channel detection, website fingerprinting, or botnet traffic identification. To make this possible, FlowLens introduces a new system design that solves a fundamental tension between the need for comprehensive flow information required by machine learning algorithms and the scarcity of hardware resources available in modern programmable switches.

To tackle this tension, we faced several major challenges at the implementation and evaluation levels that have raised the bar in proving the feasibility and effectiveness of our design. First, we identified a substantial gap between the programming environment (based on the P4 programming language) targeting a software-emulated switch and a real-world proprietary switch (e.g., the Barefoot Tofino). This gap forced us to deeply restructure our code and revisit our assumptions underpinning our original flow compression technique. Second, we realized that different machine learning security tasks proposed in the literature had been fine-tuned for their specific application domains. This means that not only do they employ different classification algorithms but even the datasets used and the training processes are different from one another. As such, we had to adopt several strategies to repurpose the classification machinery of previously existing applications to ensure their compatibility with FlowLens. Lastly, the comparison between our compression technique and other related compression techniques was hampered by the lack of accessibility to the latter’s implementation. This forced us to re-implement several of such approaches and to resort to analytical comparisons of their compute, storage, and communication costs.

In this presentation, we discuss in detail how we addressed the above challenges and provide a set of guidelines that may prove useful for future practitioners in the realm of the intersection between network security and machine learning.

Speaker's biography

Diogo Barradas is a Ph.D. candidate in Information Systems and Computer Engineering at Instituto Superior Técnico, Universidade de Lisboa. He received his BSc. (2014) and MSc. (2016) from the same institution. His main research interests include network security and privacy, with particular emphasis on statistical traffic analysis and Internet censorship circumvention. He conducts his research at the Distributed Systems Group at INESC-ID Lisboa.