To anyone who understands information theory and security and is in an infuriating argument with someone who does not (possibly involving mixed case), I sincerely apologize.
This comic says that a password such as "Tr0ub4dor&3" is bad because it is easy for password cracking software and hard for humans to remember, leading to insecure practices like writing the password down on a post-it attached to the monitor. On the other hand, a password such as "correct horse battery staple" is hard for computers to guess due to having more entropy but quite easy for humans to remember.
Entropy is a measure of "uncertainty" in an outcome. In this context, it can be thought of as a value representing how unpredictable the next character of a password is. It is calculated as log2(a^b) where a is the number of allowed symbols and b is its length.
A truly random string of length 11 (not like "Tr0ub4dor&3", but more like "J4I/tyJ&Acy") has log2(94^11) = 72.1 bits, with 94 being the total number of letters, numbers, and symbols one can choose. However the comic shows that "Tr0ub4dor&3" has only 28 bits of entropy. This is because the password follows a simple pattern of a dictionary word + a couple extra numbers or symbols, hence the entropy calculation is more appropriately expressed with log2(65000*94*94), with 65000 representing a rough estimate of all dictionary words people are likely to choose. (For related info, see https://what-if.xkcd.com/34/).
Another way of selecting a password is to have 2048 "symbols" (common words) and select only 4 of those symbols. log2(2048^4) = 44 bits, much better than 28. Using such symbols was again visited in one of the tips in 1820: Security Advice.
It is absolutely true that people make passwords hard to remember because they think they are "safer", and it is certainly true that length, all other things being equal, tends to make for very strong passwords and this can be confirmed by using rumkin.com's password strength checker. Even if the individual characters are all limited to [a-z], the exponent implied in "we added another lowercase character, so multiply by 26 again" tends to dominate the results. That's before using all symbols of ascii, html and unicode.
In addition to being easier to remember, long strings of lowercase characters are also easier to type on smartphones and soft keyboards.
xkcd's password generation scheme requires the user to have a list of 2048 common words (log2(2048) = 11). For any attack we must assume that the attacker knows our password generation algorithm, but not the exact password. In this case the attacker knows the 2048 words, and knows that we selected 4 words, but not which words. The number of combinations of 4 words from this list of words is (211)4 = 244, i.e. 44 bits. For comparison, the entropy offered by Diceware's 7776 word list is 13 bits per word. If the attacker doesn't know the algorithm used, and only knows that lowercase letters are selected, the "common words" password would take even longer to crack than depicted. 25 random lowercase characters would have 117 bits of entropy, vs 44 bits for the common words list.
- Example
Below there is a detailed example which shows how different rules of complexity work to generate a password with supposed 44 bits of entropy. The examples of expected passwords were generated in random.org.(*)
If n is the number of symbols and L is the length of the password, then L = 44 / log2(n).
Symbols Number of symbols Minimum length Examples of expected passwords Example of an actual password Actual bits of entropy Comment a 26 9.3 mdniclapwz jxtvesveiv troubadorx 16+4.7 = 20.7 Extra letter to meet length requirement; log2(26) = 4.7 a 9 36 8.5 qih7cbrmd ewpltiayq tr0ub4d0r 16+3=19 3 = common substitutions in the comic troubador1 16+3.3=19.3 log2(10) = 3.3 a A 52 7.7 jAwwBYne NeTvgcrq Troubador 16+1=17 1 = caps? in the comic a & 58 7.5 j.h?nv), c/~/fg\: troubador& 16+4=20 4 = punctuation in the comic a A 9 62 7.3 cDe8CgAf RONygLMi Tr0ub4d0r 16+1+3=20 1 = caps?; 3 = common substitutions a 9 & 68 7.2 _@~"#^.2 un$l|!f] tr0ub4d0r& 16+3+4=23 3 = common substitutions; 4 = punctuation a A 9 & 94 6.7 Re-:aRo ^$rV{3? Tr0ub4d0r& 16+1+3+4=24 1 = caps?; 3 = common substitutions; 4 = punctuation common words 2048 4 reasonableretailsometimespossibly constantyieldspecifypriority reasonableretailsometimespossibly 11×4=44 Go to random.org and select 4 random integers between 1 and 2048; then go to your list of common words correcthorsebatterystaple 0 Thanks to this comic, this is now one of the first passwords a hacker will try.
- a = lowercase letters
- A = uppercase letters
- 9 = digits
- & = the 32 special characters in an American keyboard; Randall assumes only the 16 most common characters are used in practice (4 bits)
- (*) The use of random.org explains why
jAwwBYne
has two consecutive w's, whyRe-:aRo
has two R's, why_@~"#^.2
has no letters, whyewpltiayq
has no numbers, why "constant yield" is part of a password, etc. A human would have attempted at passwords that looked random.