Tao Chen (City University of Hong Kong), Longfei Shangguan (Microsoft), Zhenjiang Li (City University of Hong Kong), Kyle Jamieson (Princeton University)
This paper presents Metamorph, a system that generates imperceptible audio that can survive over-the-air transmission to attack the neural network of a speech recognition system. The key challenge stems from how to ensure the added perturbation of the original audio in advance at the sender side is immune to unknown signal distortions during the transmission process. Our empirical study reveals that signal distortion is mainly due to device and channel frequency selectivity but with different characteristics. This brings a chance to capture and further pre-code this impact to generate adversarial examples that are robust to the over-the-air transmission. We leverage this opportunity in Metamorph and obtain an initial perturbation that captures the core distortion's impact from only a small set of prior measurements, and then take advantage of a domain adaptation algorithm to refine the perturbation to further improve the attack distance and reliability. Moreover, we consider also reducing human perceptibility of the added perturbation. Evaluation achieves a high attack success rate (95%) over the attack distance of up to 6 m. Within a moderate distance, e.g., 3 m, Metamorph maintains a high success rate (98%), yet can be further adapted to largely improve the audio quality, confirmed by a human perceptibility study.