Daiping Liu (Palo Alto Networks, Inc.), Danyu Sun (University of California, Irvine), Zhenhua Chen (Palo Alto Networks, Inc.), Shu Wang (Palo Alto Networks, Inc.), Zhou Li (University of California, Irvine)
Malicious domain detection serves as a critical technique to keep users safe against cyber attacks. Although these systems have demonstrated remarkable detection capabilities, the magnitude of their false positives (FPs) in the real world remains unknown and is often overlooked. To shed light on this essential aspect, we conduct the first measurement study using 6-year FP reports collected from one of the largest global cybersecurity vendors. Our findings reveal that the popularity-based top domain lists that are commonly adopted by current detection systems are insufficient to avoid FPs. In fact, there are still a non-trivial number of FPs in production. We posit that one of the main reasons is that efforts in this area have predominantly focused on detecting malicious indicators, i.e., Indicator of Compromise (IOC), and have made light of the benign ones, i.e., Indicator of Benignity (IOB).
In this paper, we make the first effort focusing on IOB detection. Our work is built upon our key finding that for many FPs in production, their IOBs can be found on the Internet. However, due to the openness of the Internet and unstructured Web content, we face two main challenges to identify these IOBs: understanding what an IOB is and assessing the trustworthiness of an IOB. To address these challenges, we propose a transitive trust model for IOB and implement it in a system called IOBHunter. IOBHunter leverages LLM and chain-of-thought (CoT) which have demonstrated promising capabilities to address several other security threats. Our evaluation using a dataset that contains verified FPs shows that IOBHunter can achieve 99.22% precision and 68.6% recall. IOBHunter is further evaluated in a two-months real-world deployment, in which IOBHunter has identified 4,338 confirmed FPs and 2,051 compromised domains.