US Secure Hash Algorithm 1 in PurePerl

Writing MySQL Protocol Packets in PurePerl – Part 1

Writing a Perl Module

Foreword: In this part of the series, I explain the USA Secure Hash Algorithm 1 and I implement it with Perl only.

By: Chrysanthus Date Published: 27 Jan 2015

Introduction

This is part 1 of my series, Writing MySQL Protocol Packets in PurePerl. PurePerl stands for Pure Perl, which means Perl software without any C software underneath. In this part of the series, I explain the USA Secure Hash Algorithm 1 and I implement it with Perl only. Secure Hash Algorithm 1 is abbreviated as SHA1 or SHA_1 or SHA-1. It is a popular encryption scheme for a binary message. A binary message is a string consisting of 1’s and 0’s. This encryption is used in the authentication process to access the MySQL server databases.

Pre-Knowledge
This series is part of my volume, Writing a Perl Module. The first series in this volume is Internet Sockets and Perl. The second is, Perl pack and unpack Functions. This is the third. At the bottom of this page, you have links to the different series in the volume. You should be reading the series in the order given.

In each series you should read the parts in the order given. For each part, the links and order are in a menu on the (top) left of the page.

Sections of this Article
This tutorial is divided into two sections. The first section is the explanation of SHA1 independent of any computer language. The second section is the Perl implementation. The link to download the complete Perl file in zip form is given at the bottom of the tutorial. In the first section a lot of the text is copied from the RFC 3174 memo.

SHA without Computer Language

SHA1 Overview
When a message of any length < 2^64 bits is input, the SHA-1 produces a 160-bit output called a Message Digest. The SHA-1 is called secure because it is computationally infeasible to find a message which corresponds to a given message digest, or to find two different messages which produce the same message digest.

Note: < means less-than and <= means less-than-or-equal-to; while > means greater-than and >= means greater-than-or-equal-to. 2^64 means 2 raised to the power, 64.

Before explaining the SHA1 process there are a good number of things you have to know or recall first:

You also need to know Boolean Algebra in order to understand this tutorial. If you do not know Boolean algebra, then you will have to accept the functions in the tutorial and code.

Definitions of Bit Strings and Integers
The following terminology related to bit strings and integers will be used:

A hex (hexadecimal) digit is an element of the set {0, 1, ... , 9, A, ... , F}. A hex digit is the representation of a 4-bit string.  Examples:  7 = 0111, A = 1010.

A word equals a 32-bit string, which may be represented as a sequence of 8 hex digits (32 / 4 = 8). To convert a word to 8 hex digits, each 4-bit string is converted to its hex equivalent as described above. Example:

      1010 0001 0000 0011 1111 1110 0010 0011 = A103FE23.

There are four bytes here, which are equivalent to 8 half-bytes of 4 digits each. The 8 half bytes are equivalent to eight hex digits.

A decimal integer between 0 and 2^32 - 1 inclusive may be represented as a word.  2^32 is 4,294,967,296 and 2^32 – 1 is 4,294,967,295. The least significant four bits of any integer are represented by the right-most hex digit of the word representation. Example: the decimal integer, 291 is 00000000 00000000 00000001 00100011 in bytes, which is the word, 00000123 in hexadecimal.

If z is an integer, 0 <= z < 2^64, then z = (2^32)x + y where 0 <= x < 2^32 and 0 <= y < 2^32.  Since x and y can be represented as words X and Y, respectively, z can be represented as the pair of words (X,Y). For example, if x is the decimal, 100 and y is the decimal, 200, then z would be, (2^32)*100 + 200,. Which is 4,294,967,296*100 + 200, which is, 429,496,729,800, which is in the range, 0 <= z < 2^64. Note, 2^64 is 18,446,744,073,709,551,616. The decimal number, 100 is the word 00000000 00000000 00000000 01100100. The decimal number, 200 is the word 00000000 00000000 00000000 11001000. There is a function that can combine these two words to give the decimal, 429,496,729,800.

A block is defined as a 512-bit string, which is made up of 16 words. That is, if you place 32 bits next to one another, 16 times, you will have 512 bits (32 * 16 = 512).

Operations on Words
The following logical operators will be applied to words:

a) Bitwise logical word operations:

      X AND Y  =  bitwise logical "and" of  X and Y.

      X OR Y   =  bitwise logical "inclusive-or" of X and Y.

      X XOR Y  =  bitwise logical "exclusive-or" of X and Y.

      NOT X    =  bitwise logical "complement" of X.

Example:

               01101100101110011101001001111011
    XOR   01100101110000010110100110110111
               --------------------------------
          =   00001001011110001011101111001100

b) The operation X + Y is defined as follows:  words X and Y represent integers x and y, where 0 <= x < 2^32 and 0 <= y < 2^32. For positive integers n and m, let n mod m be the remainder upon dividing n by m.  Compute

         z  =  (x + y) mod 2^32.

Then 0 <= z < 2^32.  Convert z to a word, Z and define Z = X + Y.

c) The circular left shift operation S^n(X), where X is a word and n is an integer with 0 <= n < 32, is defined by

         S^n(X)  =  (X << n) OR (X >> 32-n).

In this logical function, X << n is obtained as follows: discard the left-most n bits of X and then pad the result with n zeroes on the right (the result will still be 32 bits).  X >> n is obtained by discarding the right-most n bits of X and then padding the result with n zeroes on the left.  Thus S^n(X) is equivalent to a circular shift of X by n positions to the left.

Message Padding
SHA-1 is used to compute a message digest for a message or data file that is provided as input.  The message or data file should be considered to be a bit string.  The length of the message is the number of bits in the message (the empty message has length, 0).  If the number of bits in a message is a multiple of 8, for compactness we can represent the message in hex.  The purpose of message padding is to make the total length of a padded message a multiple of 512 (block). SHA-1 sequentially processes blocks of 512 bits when computing the message digest.  The following specifies how this padding shall be performed.  As a summary, a "1" followed by m "0"s followed by a 64-bit integer are appended to the end of the message to produce a padded message of length 512 * n.  The 64-bit integer is the length of the original message.  The padded message is then processed by the SHA-1 as n 512-bit blocks.

Suppose a message has length l < 2^64.  Before it is input to the SHA-1, the message is padded on the right as follows:

a) "1" is appended.  Example: if the original message is "01010000",
      this is padded to "010100001".

b) "0"s are appended.  The number of "0"s will depend on the original length of the message.  The last 64 bits of the last 512-bit block are reserved for the length l of the original message.

As another example:  suppose the original message is the bit string

         01100001 01100010 01100011 01100100 01100101.

After step (a) this gives

         01100001 01100010 01100011 01100100 01100101 1.

Since l = 40, the number of bits in the above is 41 and 407 "0"s are appended, making the total now 448.  This gives (in hex)

         61626364 65800000 00000000 00000000
         00000000 00000000 00000000 00000000
         00000000 00000000 00000000 00000000
         00000000 00000000.

   c) Obtain the 2-word representation of l, the number of bits in the original message.  If l < 2^32 then the first word is all zeroes. “2-word” is for the 64 bits of the original length. Append these two words to the padded message.

Example: Suppose the original message is as in (b).  Then l = 40 (note that l is computed before any padding). The two-word representation of 40 is hex 00000000 00000028.  Hence the final padded message is hex,

         61626364 65800000 00000000 00000000
         00000000 00000000 00000000 00000000
         00000000 00000000 00000000 00000000
         00000000 00000000 00000000 00000028.

This is a 512 bit-block. As you can see from this structure, a word in hexadecimal is 8 digits. There are 16 words in any block. In practice, the padded message will contain 16 * n words for some n > 0, where n is the number of blocks. The padded message is regarded as a sequence of n blocks M(1) , M(2), M(2), etc.

Functions and Constants Used
A sequence of logical functions f(0), f(1),..., f(79) is used in SHA-1.  Each f(t), 0 <= t <= 79, operates on three 32-bit words B, C, D (see below) and produces a 32-bit word as output.  f(t;B,C,D) is defined as follows: for words B, C, D,

      f(t;B,C,D) = (B AND C) OR ((NOT B) AND D)                           ( 0 <= t <= 19)

      f(t;B,C,D) = B XOR C XOR D                                                      (20 <= t <= 39)

      f(t;B,C,D) = (B AND C) OR (B AND D) OR (C AND D)                  (40 <= t <= 59)

      f(t;B,C,D) = B XOR C XOR D                                                            (60 <= t <= 79).

   A sequence of constant words K(0), K(1), ... , K(79) is used in the
   SHA-1.  In hex these are given by

      K(t) = 5A827999         ( 0 <= t <= 19)

      K(t) = 6ED9EBA1         (20 <= t <= 39)

      K(t) = 8F1BBCDC         (40 <= t <= 59)

      K(t) = CA62C1D6         (60 <= t <= 79).

Computing the Message Digest
The message digest is computed beginning with the padded message.  The computation is performed using two buffers, each consisting of five 32-bit words, and a sequence of eighty 32-bit words. The words of the first 5-word buffer are labeled A,B,C,D,E. The words of the second 5-word buffer are labeled H0, H1, H2, H3, H4. The words of the 80-word sequence are labeled W(0), W(1),..., W(79). A single word buffer TEMP is also employed.

To generate the message digest, the 16-word blocks M(1), M(2),..., M(n) are processed in order. The processing of each M(i) involves 80 steps.

Before processing any blocks, the H's are initialized as follows: in hex,

      H0 = 67452301

      H1 = EFCDAB89

      H2 = 98BADCFE

      H3 = 10325476

      H4 = C3D2E1F0.

Now M(1), M(2), ... , M(n) are processed.  To process M(i), we proceed as follows:

a) Divide M(i) into 16 words W(0), W(1), ... , W(15), where W(0) is the left-most word. These are the first 16 words in the 80-word sequence.

b) For t = 16 to 79 let

         W(t) = S^1(W(t-3) XOR W(t-8) XOR W(t-14) XOR W(t-16)).

c) Let A = H0, B = H1, C = H2, D = H3, E = H4.

d) For t = 0 to 79 do

         TEMP = S^5(A) + f(t;B,C,D) + E + W(t) + K(t);

         E = D;  D = C;  C = S^30(B);  B = A; A = TEMP;

      e. Let H0 = H0 + A, H1 = H1 + B, H2 = H2 + C, H3 = H3 + D, H4 = H4
         + E.

After processing M(n), the message digest is the 160-bit string represented by the 5 words:

         H0 H1 H2 H3 H4

in that order.

The Perl Code Explained
Before we continue, know that: a character (e.g. ‘r’) is represented already, internally as a byte (8 bits); two hexadecimal characters make 1 byte.

Do not confuse between a decimal digit and a string character. A string character is made up of 8 bits forming a byte. A string character can be represented by two hexadecimal digits or characters, because two hexadecimal digits form a byte. The relationship between a decimal number and a byte is not really a one-to-one relationship.

The complete Perl code is saved in a file called, Sha1.pm. The file is a Perl package (module). The package has one function that takes a normal alphanumeric string as input and produces a hexadecimal SHA1 (encrypted) string. The function processes all the above algorithm.

The encoding system used for the character set is ASCII or ASCII compatible.

Padding the Message
The first thing the Perl code does is to check if the number of bits in the message is less than 2^64 and then pads the message. In practice, the message is in bytes. So the code checks if the number of bytes is less than 2305843009213693952 (18,446,744,073,709,551,616 / 8) before doing the padding. The input message is typically a string of characters, and each character is 1 byte. When a decimal digit is typed into a string, it is a character and not a number.

The function works in bytes, so the 1 that is to be added, is added as a byte (0x80).

In the code there are places where bytes are converted to decimal integers and vice-versa. The Perl pack() and unpack() functions are used for these.

The Buffers and Other Words
The two 5-word buffers are Perl hashes. The words of the 80-word sequence are in an array. The TEMP buffer is a variable. The 16-word blocks, M(1), M(2),..., M(n) are in an array, where M(1) is $M[1], M(2) is $M[2] and so on.

If you have read the previous series titled, “Perl pack and unpack Functions” and all the pre-knowledge series, then the Perl code should be self-explanatory.

Link to Free Download
After downloading, if you are not using AcivePerl, then you should precede the file content with something like, #!/usr/bin/perl . The name of the file is Sha1.pm in a zipped folder. The free download link to the US Secure Hash Algorithm 1 in PurePerl is (read the agreement):

PurePerl US Secure Hash Algorithm 1

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Broad Network

Related Articles

US Secure Hash Algorithm 1 in PurePerl

Writing MySQL Protocol Packets in PurePerl – Part 1

Writing a Perl Module

Introduction

SHA without Computer Language

Related Links

Comments