Password Strength Meter in R

May 21, 2020 12 min read

Introdution

Passwords. Keys to our riches, or our personal info. Equivalent to the keys to a safe box if this was 1967. But its 2020, so you’ve put most of your stuff online. But how secure are your passwords? Let me guess; it’s your phone number! Or a combo of your first name and your year of birth! But probably not. You probably have invested time to think and come up with a strong password. But how strong is it anyway? Have you tried running it in a password meter? I don’t know, but I hope you got it right. Anyway, I needed a password meter. I wanted to know how strong or weak my password was. That’s how I came across http://www.passwordmeter.com/. The site indicated how the calculations were made, which I found interesting. I thought about doing it in R, because it was all about REGEX, and I enjoy dealing with strings. At that moment, I couldn’t do much, but finally, here we are!

There are really no given set of rules or requirements for a password. This varies typically from website to website. However, we can say with some substantial level of confidence that the rules below are most common. An acceptable password;

Must have a minimum of 8 characters
Contains at least each of the following;
- Upper case letter
- Lower case letter
- Number
- Symbol

For the password meter above and this post, the rules that will be used are

A minimum of 8 characters. This is mandatory. Even if all other requirements are met, and this is not, there will be no score for requirements (we’ll see that later)
Contains 3/4 of the following
- Upper case letter
- Lower case letter
- Number
- Symbol

To determine the password strength, some patterns in the password will lead to the addition of points, while others will lead to deductions. When we tackle each pattern, we’ll explore how it contributes to the password meter.

These are the patterns that result in additions.

Requirements stated above
Number of Characters
Uppercase Letters
Lowercase Letters
Numbers
Symbols
Middle Numbers or Symbols

These are the patterns that lead to deductions.

Letters only
Numbers only
Repeat Characters (Case Insensitive)
Consecutive Uppercase Letters
Consecutive Lowercase Letters
Consecutive Numbers
Sequential Letters (3+)
Sequential Numbers (3+)
Sequential Symbols (3+)

For each (addition and subtraction), we will get the count and, in the end, find the total. We will then create a function that takes in the password and returns the score of every pattern. For each pattern, we will use relevant examples to see how the output will look like.

Loading required libraries;

library(stringi)
library(stringr)

Addition

Number of Characters

password = "PasswordTest34#"
num_chars = nchar(password)
num_chars

## [1] 15

Uppercase Letters

upper_case = stringi::stri_count(password, regex  = "[A-Z]")
upper_case

## [1] 2

Lowercase Letters

lower_case = stringi::stri_count(password, regex  = "[a-z]")
lower_case

## [1] 10

Numbers

nums = stringi::stri_count(password, regex  = "[0-9]")
nums

## [1] 2

Symbols

Here, we create a vector that contains all the possibly symbols (or at least all possible symbols I could trace. Forgive me if I repeated any)

symbols = c("~", "!", "@", "#", "\\$", "%", "\\^", "&", "\\*", "\\(" ,"\\)", "-", "\\+", "\\_", "=", "`" ,
              "\\{" ,"\\}" ,"\\[" ,"\\]",":", ";" , "<" , ">", "\\?" ,"," ,"\\.", "\\'", "@", "#", noquote("\""))

We then look for the count of these symbols in the password

num_symbols = stringr::str_count(password, paste(symbols, collapse = "|"))
num_symbols

## [1] 1

Mid numbers and symbols

To get the number of middle symbols and numbers, we first eliminate the first and last characters, then count the number of numbers and symbols

mid_chars = gsub('^.|.$', '', password)
mid_chars

## [1] "asswordTest34"

midnums = stringi::stri_count(mid_chars, regex  = "[0-9]")
midnums

## [1] 2

mid_symbols = stringr::str_count(mid_chars, paste(symbols, collapse = "|"))
mid_symbols

## [1] 0

Requirements

All good! Now its time to create the actual scores from these counts. Some conditions will have to be met for some of these to apply, while others will not require any condition.

The requirements score is a tricky one. For the requirements score to exist, the mandatory requirement of 8 characters should be met and the 3 out of 4 other requirements. If any misses, then its a zero. So we will,

Create a vector that contains the four requirements (requirements) from which at least three should be met.
Create a requirements score (requirements_score) vector and set it at zero.
Loop over the vector (requirements), when a value is greater than 1 (meaning the requirement has been met), we increase the requirement score by 1
Check whether the password has more than eight characters. If not, we set the requirements score to 0. If the count is greater than 8, we check whether the requirement score coming from the loop above is greater than or equal to three. If so, then we add 1 to the requirement score (since the password has more than eight characters) then multiply by 2.

requirements = c(upper_case, lower_case, nums, num_symbols)
requirements

## [1]  2 10  2  1

requirements_score = 0
for (i in requirements) {
  if(i > 0) requirements_score = requirements_score + 1 
}
requirements_score

## [1] 4

requirements_score = ifelse(num_chars < 8, 0, 
                            ifelse((requirements_score) >= 3, (requirements_score + 1) * 2, 0))
requirements_score

## [1] 10

The character count score is 4 times the number of characters

character_count_score = (num_chars * 4)
character_count_score

## [1] 60

The upper case score will only apply if the password contains upper case letters. If there are no upper case letters, the score will be zero. Otherwise, the score will be the difference of the number of characters and the number of uppercase letters multiplied by two.

upper_case_score = ifelse(upper_case == 0, 0, ((num_chars - upper_case)*2))
upper_case_score

## [1] 26

The lower case score will be similar to the upper case score

lower_case_score = ifelse(lower_case == 0, 0, ((num_chars - lower_case)*2))
lower_case_score

## [1] 10

The numbers score is 4 times the number of numbers in the password.However, it only applys if the password has other characters apart from numbers

numbers_score = ifelse(upper_case > 0 | lower_case > 0 | num_symbols > 0, (nums * 4), 0)
numbers_score

## [1] 8

The symbols score will be six times the count of symbols

symbols_score = num_symbols * 6
symbols_score

## [1] 6

The mid number and symbol score will be twice the count of numbers and symbols in the middle

mid_nums_symbol_score = ((midnums + mid_symbols) * 2)
mid_nums_symbol_score

## [1] 4

We now add together all the addition scores and store them

total_additions = 0 + requirements_score + character_count_score + upper_case_score + lower_case_score + numbers_score + symbols_score + mid_nums_symbol_score
total_additions

## [1] 124

Subtraction

Now, lets explore the patterns that lead to deductions. Some are straight forward, but most of them will require us to do something extra! But we are up to the task!

Letters only

There will be deduction if the password has letters only. To check this, we can compare the number of characters and the number of uppercase and lowercase letters

letters_only = ifelse(num_chars == (upper_case + lower_case), num_chars, 0)
letters_only

## [1] 0

Numbers only

numbers_only = ifelse(num_chars == (nums), num_chars, 0)
numbers_only

## [1] 0

Consecutive uppercase letters

Here, we will a password that has consecutive uppercase letters i.e. PASSwordTEst34#, then we will:

first create a function that takes in our password and returns a vector with each character as an object of the vector.

password = "PASSwordTEst34#"
split_function = function(password){
    password = str_extract_all(password, paste(c("[a-z]", "[A-Z]", "[0-9]", symbols), collapse = "|"))
    password = password[[1]]
    return(password)
}
split_pass = split_function(password)
split_pass

##  [1] "P" "A" "S" "S" "w" "o" "r" "d" "T" "E" "s" "t" "3" "4" "#"

We check for uppercase letters. We want to return 1 if it is an uppercase letter and 0 if not

consecutive_upper = ifelse(split_pass %in% LETTERS, 1, 0)
consecutive_upper

##  [1] 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0

Now, we want to know how many consecutive zeros and ones are there in our output. The function below will return values and their lengths. For instance, we get that 1 has been repeated four times, then 0 four times, 1 two times and and 0 five times.

r = rle(consecutive_upper)
r

## Run Length Encoding
##   lengths: int [1:4] 4 4 2 5
##   values : num [1:4] 1 0 1 0

We are interested in the 1s, that translate to uppercase letters as we saw above. We will therefore extract where the value is 1. Further, for it to be considered consecutive, there must be 2 or more in the length. If three letters are consecutive, the length above will be 3, but ideally thats two cases of consecutive letters. Therefore, while extracting, we will subtract one.

consecutive_upper = (r$lengths[r$values == 1]) - 1
consecutive_upper

## [1] 3 1

The output above is the number of times there are consecutive uppercase letters in different instances. So the sum will give us the total instances of consecutive uppercase letters.

consecutive_upper = sum(consecutive_upper)
consecutive_upper

## [1] 4

Lowercase letters

The above procedure is repeated to get the total number of instances with consecutive lowercase letters

split_pass = split_function(password)
consecutive_lower = ifelse(split_pass %in% letters, 1, 0)
r = rle(consecutive_lower)
consecutive_lower = (r$lengths[r$values == 1]) - 1
consecutive_lower = sum(consecutive_lower)
consecutive_lower

## [1] 4

Consecutive numbers

split_pass = split_function(password)
consecutive_numbers = ifelse(split_pass %in% (0:9), 1, 0)
r = rle(consecutive_numbers)
consecutive_numbers = (r$lengths[r$values == 1]) - 1
consecutive_numbers = sum(consecutive_numbers)
consecutive_numbers

## [1] 1

Letters, numbers, and symbols sequence

Here, we want to check for any sequential letters, which respect to the alphabet regardless of the case. For it to be a sequence, the minimum sequential letters that must be there is 3. For instance, if the password is “abcdef”, then the instances of sequential letters are abc, bcd, cde, and def. i.e. four instances. The sequence can also be in reverse order. For instance, “mlkji” is still a sequence! The same case applies to numbers (0-9) and symbols (~ to +) To achieve this, we will;

For letters, we will first get the position in the alphabet irrespective of the case. If the character is not in the alphabet, we will get a zero.

password = "uVwxYz123$EDcba!@#"
split_pass =split_function(password)
split_pass

##  [1] "u" "V" "w" "x" "Y" "z" "1" "2" "3" "$" "E" "D" "c" "b" "a" "!" "@" "#"

sequence_check = ifelse(tolower(split_pass) %in% letters, match(tolower(split_pass), letters), 0)
sequence_check

##  [1] 21 22 23 24 25 26  0  0  0  0  5  4  3  2  1  0  0  0

We will then absolute get the difference of the output above. The absolute will take care of reverse sequence.

sequence_check = abs(diff(sequence_check))
sequence_check

##  [1]  1  1  1  1  1 26  0  0  0  5  1  1  1  1  1  0  0

Now, if there was a sequence, the difference above will be 1. We now change everything else that is not a 1 to a zero.

sequence_check = ifelse(sequence_check == 1, 1, 0)
sequence_check

##  [1] 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0

Cool! Now we can find do like we did before, see how many times each was repeated, and have the output looking like a summary.

r = rle(sequence_check)
r

## Run Length Encoding
##   lengths: int [1:4] 5 5 5 2
##   values : num [1:4] 1 0 1 0

Again, we are interested in ones only, so we extract from the output above cases where the value is one, and we deduct 1

sequence_check = (r$lengths[r$values == 1]) - 1
sequence_check

## [1] 4 4

We do a sum of the output to get the total number of instances a sequence occurs

sequence_check = sum(sequence_check)
sequence_check

## [1] 8

Finally, for efficiency, we put all these steps into a function. The function takes in the splitted password e and the variable type The variable type will be either letters, numbers or symbols. The function returns the total instances a sequence occurs

sequence_checker = function(split_pass, var_type) {
    
  if(var_type == "numbers"){
    
    sequence_check = as.integer(ifelse(split_pass %in% (0:9), split_pass, 0))
    
  } else if (var_type == "symbols"){
    
    sequence_check = ifelse(split_pass %in% symbols[1:13], match(split_pass, symbols[1:13]), 0)
    
  } else {
    
    sequence_check = ifelse(tolower(split_pass) %in% letters, match(tolower(split_pass), letters), 0)
    
  }
  sequence_check = abs(diff(sequence_check))
  sequence_check = ifelse(sequence_check == 1, 1, 0)
  r = rle(sequence_check)
  sequence_check = (r$lengths[r$values == 1]) - 1
  sequence_check = sum(sequence_check)
  return(sequence_check)
}

Sequential letters

split_pass

##  [1] "u" "V" "w" "x" "Y" "z" "1" "2" "3" "$" "E" "D" "c" "b" "a" "!" "@" "#"

sequence_letters = sequence_checker(split_pass, "letters")
sequence_letters

## [1] 8

Sequential numbers

sequence_num = sequence_checker(split_pass, "numbers")
sequence_num

## [1] 2

Sequential symbols

sequence_symbols = sequence_checker(split_pass, "symbols")
sequence_symbols

## [1] 1

Just like we saw in additions, substraction scores will also be calculated using the counts we have generated above. The chunk below shows how each score will be calculated. They will then be added and resulting score stored and used to calculate the total score.

letters_only_score = letters_only
numbers_only_score = numbers_only
consecutive_upper_score = consecutive_upper * 2
consecutive_lower_score = consecutive_lower * 2
consecutive_numbers_score = consecutive_numbers * 2
sequence_letters_score = sequence_letters * 3
sequence_num_score = sequence_num * 3
sequence_symbols_score = sequence_symbols * 3

total_deductions = letters_only_score +
  numbers_only_score +
  consecutive_upper_score + 
  consecutive_lower_score + 
  consecutive_numbers_score +
  sequence_letters_score + 
  sequence_num_score +
  sequence_symbols_score

total_deductions

## [1] 51

Bringing all together

We now have the total addition and total deductions. Now, the password strength is supposed to range between 0 and 100. From our calculations, it is possible to get a negative score, as well as a score above 100. Therefore, should a score be less than 0, we want that to be o and if it is greater than 100, we want that to be 100. Otherwise, the total score is given by total additions - total deductions

total_score = ifelse((total_additions - total_additions) < 0 , 0, ifelse((total_additions - total_additions) > 100, 100, (total_additions - total_additions)))

We can now put all these together in a function and try a few passwords and see the output

Try a password

Conclusion

We were able to build our own password meter in R! However, there is something slight missing, which I am still trying to figure out. In the website, there is a deduction called “Repeat Characters (Case Insensitive)”. I still haven’t figured out how it was arrived at. If you can, please try it out. You can inspect the page and check the code.

Having done that, I am really pleased with myself, and if you have followed this post at to this point, I am grateful that you took your time. In the process, I learnt a lot of cool stuff to do with strings, and I hope you have too. In case of suggestions or questions feel free to hit me up, and until next time, bye for now! And Kinaro was here when I did this, but he was sleeping!