Some insights about password shapes

Back in September 2010 at ESORICS, I presented Kamouflage, a new kind of password manager based on the following premise: an attacker cannot perform an offline attack on a password manager if he does not know how to test the success of his decryption. An attacker will not know whether his decryption is successful in Kamouflage, because every decryption returns a set of passwords that looks plausible. Accordingly, the security of the entire scheme is based on our ability to create decoy password sets that look real. (If you would like the full description of how Kamouflage works, you can download the paper/slide from here.) In order to come across as compelling decoys, the Kamouflage passwords must mimic passwords that human users create every day. To figure out how users construct their passwords, I extensively analyzed leaked password databases. Today, I would like to share with you some insights that I discovered about password “shapes.” More specifically, I will discuss some of the interesting metrics I computed from the RockYou database, which is, as far as I know, the largest password database ever leaked, with 32 million passwords!

To analyze the RockYou database, I will deviate slightly from the standard approach: instead of focusing on the most frequently used word/password, I will discuss how passwords are constructed, because Kamouflage must construct its own passwords. (If you would like the standard metrics, just ask). Our first graph below represents the RockYou passwords, classified by shape and brought to you by my password shape analyzer, which uses a bunch of regular expressions, big dictionaries and a bit of black magic.

Note that classifying passwords by shape is very efficient, as we are able to cover more than 80% of the RockYou database with less than 10 shapes. As to be expected, most of the RockYou database passwords are at least partially made of dictionary words, as shown on the graph below. More than 20% of them are actually a simple word (category strict). What is more surprising is that the second most popular shape is the use of two words concatenated together! (As far as I can tell, I have never seen this combination discussed before.) Many of these two-word passwords reflect possessive statements. For example, some of the most popular are “myangel” and “mybaby”. The third most commonly used shape is a full digit (i.e., 123456).

For all the other password shapes, people are combining word(s) with digits, so I find it interesting to observe where people decide to put those digits. As visible in the diagram below, an overwhelming percentage of people (67%) choose to place them after word(s) (“password1”). Putting the digit before word(s) is also a common practice (27%). This leaves very few people who decide to add digits in the middle of their passwords. When this happens, it is often because it is between two words (“my1baby”) or because they are using leet language (“s3cr3t”).

In addition to the two-word shape, one of the most fascinating facts about password shape that I discovered during this study is that people prefer even numbers to odd ones. If you glance at the chart below, you will see that people prefer the number 2 over 1, they prefer 4 over 3, and so on. One explanation I have for this behavior is that many numbers used every day (for age, date, cell phones, etc.) are even. Nevertheless, I find it fascinating that people will consistently select an even number over an odd one, such as in the case of 10 versus 9 even if it makes their password longer.

In case you were wondering what the most commonly used digits are, I made the following chart for you:

Without surprise, 1 is by far the most frequently used digit. Intuitively, I would expect 0 to be a little more popular, but nope: 0 is not a cool kid these days, and 9 is clearly the least loved digit. I hope this quick post has shed some light on what user passwords look like. As usual, if you have any questions or wish to share your insights, please comment.

Some insights about password shapes

Recent

On the consequences of the AI workforce entering the market

RETVec: Resilient and Efficient Text Vectorizer

How AI helps keeping Gmail inboxes malware free

Get cutting edge research directly in your inbox.