Aléas numériques

Linux, infosec and whatever crosses my mind.


» Dice-generated passphrases

Hi there! Following up my previous post about Generating secure passwords, I’ll talk here about another way to generate passwords that are:

  • complex (high entropy);
  • easy1 to remember.

Those passwords are called dice-generated passphrases. Two questions arise: why dice-generated, and why passphrases?

First, they are dice-generated because they can be built by throwing 5 dices once (or one dice five times). The 5 values you get are mapped to a given and unique word in a pre-built dictionary, meaning each combination has a unique word associated.

They are called “passphrases” and not “passwords” because they are composed of multiple words, instead of a single one like most passwords.

Bash implementation

#!/bin/bash
# Bash script implementing Dice-generated passphrases
# License: MIT License (c) 2023 Hugo Blanc

set -eu
IFS="$(printf '\n\t')"

# https://www.eff.org/files/2016/07/18/eff_large_wordlist.txt
dict_path="/usr/share/dict/eff_large_wordlist.txt"

[[ -f "$dict_path" ]] || (echo "error: wordlist '$dict_path' not found" >&2; exit 1)

password=""
for i in {1..6}; do
   sequence="$(< /dev/urandom tr -dc 1-6 | head -c${1:-5};echo;)"
   password="${password}-$(grep "$sequence" "$dict_path" | awk '{print $2}')"
done

echo "${password:1}"

This program checks if the file eff_large_wordlist.txt is present on the system at /usr/share/dict along with other dictionaries. I thought it would be a great spot to store it.

The randomness comes from /dev/urandom, which is pretty okay for this job.

Let see the script in action:

$ ./dicepass.sh
peso-county-plant-fester-pastime-outmatch

$ ./dicepass.sh
hybrid-unmanaged-boxer-dullness-thirsting-dusk

$ ./dicepass.sh
spellbind-dreamt-embolism-cranberry-eggbeater-thespian

Security considerations

As the charset used for this kind of passphrases is quite small ([a-z]) + -, they need to be long, hence the use of multiple words. As stated im my blog post about Generating secure passwords, the entropy of a password is defined using the formula:

$$ H = \log_2 N^{L} = L\log_2N = L \times \dfrac{\log N}{\log 2} $$

where $N$ is the size of the charset and $L$ the length of the pass{word,phrase}.

Now you may think: “yeah the entropy is quite high, be we know the list used to create passwords, so it limits the guesses an attacker has to do!”. To prove it wrong, let’s calculate the number of combinations we can do, with a passphrase composed of 6 words based on the EFF list. The list contains 7776 entries:

$ curl https://www.eff.org/files/2016/07/18/eff_large_wordlist.txt 2>/dev/null| wc -l 
7776

which makes sense: we throw 5 dices, and each of them can be anything between 1 and 6, and $6^5 = 7776$. We want to roll the dices six times to build our passphrase, and each word has a probability to appear of $1 \div 7776$, so $7776^6 = 221073919720733357899776$ combinations. I’m personally okay with this number :)

Let’s calculate the entropy using pass-checker, a small utility I wrote a few years ago:

$ for i in {1..10}; do dicepass | pass-checker | grep Entropy; done
Entropy: 175.739 bits
Entropy: 169.881 bits
Entropy: 158.165 bits
Entropy: 181.597 bits
Entropy: 187.455 bits
Entropy: 187.455 bits
Entropy: 205.029 bits
Entropy: 169.881 bits
Entropy: 158.165 bits
Entropy: 169.881 bits

Nice! An entropy higher than 150 bits is pretty good. It would take more than a thousand trillion years to bruteforce! We can also see how the entropy varies, depending on the length of the words we randomly picked up.

The longest words in the list are 9 characters long:

$ awk '{print $2}' eff_large_wordlist.txt | awk '{ print length, $0 }' | sort -n | tail
9 worrisome
9 worsening
9 worshiper
9 wrongdoer
9 wrongness
9 yesterday
9 zestfully
9 zigzagged
9 zookeeper
9 zoologist

so the maximum theoretical entropy is $59 \times \dfrac{\log 27}{\log 2} = 247.25$ bits.

The crucial part of this is to use a cryptographically secure pseudorandom number generator to ensure the randomness of the output. This may be in fact the hardest part.


  1. Well, easier than a string of purely random chars. ↩︎