Zhen Li (Nankai University), Ding Wang (Nankai University)
As the number of users' password accounts are constantly increasing, users are more and more inclined to reuse passwords. Recently, considerable efforts have been made to construct targeted password guessing models to characterize users' password reuse behaviors. However, existing studies mainly focus on characterizing slight modifications by training only on similar password pairs (e.g., textnormal{texttt{Shark0301} → texttt{shark03}}). This leads to overfitting and causes existing models to overlook users' large modification behaviors (e.g., textnormal{texttt{Shark0301} → texttt{Bear03}}). To fill this gap, this paper introduces a new non-parametric method named emph{k}-nearest-neighbors targeted password guessing (KNN-TPG). KNN-TPG builds a datastore that retains the context vector of all source passwords along with prefixes of the targeted passwords. During the generation of a new password, KNN-TPG retrieves emph{k} nearest neighbor vectors from the datastore to ensure that the generated passwords align better with realistic password distributions. By creatively combining KNN-TPG with our proposed Transformer-based password model, we propose a new targeted password guessing model, namely KNNGuess. At each step of generating a new password, KNNGuess predicts and utilizes three distinct distributions, aiming to comprehensively model users' password reuse behaviors.
We demonstrate the effectiveness of our KNNGuess model and the KNN-TPG method through extensive experiments, which include 12 large-scale real-world password datasets, containing 4.8 billion passwords. More specifically, when the victim's password at site A is compromised (namely $pw_A$), within 100 guesses, the cracking success rate of KNNGuess for guessing her password at site B (namely $pw_B$, and $pw_B$$neq$$pw_A$) is 25.40% (for common users) and 10.26% (for security-savvy users), which is 8.52%-119.0% (avg. 55.33%) higher than its foremost counterparts. When comparing with state-of-the-art password models (i.e., Pass2Edit and PointerGuess), this value is 8.52%-27.66% (avg. 18.09%) higher. Our results highlight that the threat of password tweaking attacks is higher than users expected.