Delving Into Bitcoin Hashing
SHA-256, and cryptographic hashes in general, take an input (string, transaction, or data) and create a fixed size output of something seemingly random. Easily described, hashing does this by reducing the inputs into binary form, adding some 0s and 1s, and then mixing and mashing until the algorithm is happy with the result. This process is one-way; it is just about impossible to crack a SHA-256 hash .
Given how vital SHA-256 is to Bitcoin, just saying that SHA-256 works by ‘computer magic’ seems like an explanation left wanting. So here is a little bit more of an in-depth explanation (if you don’t have a background in computers, then we highly recommend the Craft-Crypto Crash Course in Digital Logic for SHA-256).
SHA-256 Process Overview
SHA-256 has two main steps: preparing and incorporating. The data is first padded and split into subgroups. Then each subgroup is combined with other subgroups and prime numbers. In the end, the subgroups are recombined to form the final hash.
The input data is prepared by first reducing it to binary form. It is then padded (0s are added) to make its length a multiple of 512. The last 64 bits of the padding are not just 0s but the length of the input in binary.
After the data is padded, it is split into 512 bit blocks, and a root hash is created. The root hash is made from the square roots of the 1st 8 primes.
The following steps are then executed per 512 bit block:
- The block is split into 16 ‘words’ of 32 bits each.
- 48 new words are generated by mixing the previous blocks. This creates a total of 64 words.
- A group of variables is set equal to the root hash
- A single word and prime are incorporated into the group of variables. This is repeated for each word (64 times)
- The groups of variables are mixed with the root hash to create a new root hash
The last root hash then becomes the final hash and is the output of SHA-256.
For this article, we are going to use 4 bit words instead of 32 (see here for full scale instructions). This makes the examples a lot easier to understand while keeping the core mechanics of SHA-256 essentially the same. The only major difference is in the shifting functions. Full scale SHA-256 shifts bits by up to 25 spots, which does not make sense for a 4 bit word. As such, the shifting functions were scaled down appropriately.
Now, Hodl on! We’re about to mash some bits! =)
Time for Some Hashing!
Let’s hash the letter μ!
This is it in binary form:
The first step is padding the data. Instead of making blocks of 512 like the full-scale version, we’ll be using blocks of 16 bits. The first step of padding is to add a 1 to the end of the block (easy enough).
Then we follow it up with three 0s. Three are added instead of 7 as the last 4 bits are reserved for the length of the input (we need 4 bits to represent a length of 8).
μ’s length of 8 is ‘1000’ in binary, but we are going to use ‘0100’ instead. This is solely for cosmetic reasons. Adding a ‘1000’ and then another ‘1000’ makes for a dull example. Adding ‘0100’ gives more flavor. =)
Now that we have our data padded, we make our root hash!
Building Blocks of SHA-256
SHA-256 is built from 8 subgroups, which are initially set to the fractional part of the square root of the 1st 8 primes. This is called the root hash.
The fractional part of a prime is just the numbers after the decimal.
Full-scale SHA-256 would use 32 bits of the fractional part. We are just going to use the first 4 to match our word size:
Using just 4 bits causes some of the fractional parts to match. This is not the case when using 32 bits.
Hashing uses a lot of prime numbers to kill any patterns. Some steps in hashing can create patterns, which the primes prevent. If a hash function creates a pattern, then it can be cracked (which is bad).
Now that we have our root hash and padded data, we can start splitting the data.
Splitting and Creating New Words
Our padded data is split into 4 (since we are using 4 bit blocks). These 4 are then used to generate an additional 4 words.
Generating new words uses the ‘C_Sigma’ function. This is a combination of circular shifts and XOR gates (if you need a refresher on those check here).
New words are the addition of the 4 previous words with 2 of the words transformed by C_Sigma. The first new word uses the original 4, the 2nd new word uses the 2nd, 3rd, 4th, and 5th word, and so on and so forth. This is done until you have 8 words.
(SHA-256 uses modular addition. So any ‘+’ denotes the addition of the bits, but you only keep the last 4 bits.)
Here are our 8 words (for complete transparency, I made up the last 2… but they look pretty!).
So now that we have all the words, we can create the variable group and start incorporating the words into the root hash.
Words and Hash Mash
Our variable group, [a b c d e f g h], is set equal to the root hash. These are the variables that will change when we combine them with our data. At the end, we’ll add [a b c d e f g h] back to the root hash to create a new hash.
This incorporation uses 3 additional functions: ‘Combine’, ‘S_Combine’, and ‘Sigma’.
The words are also combined with more primes to make sure all patterns are thoroughly killed. The primes used in this step are the fractional parts of the cubic roots of the first 8 primes:
Our words and cubic primes are incorporated into [a b c d e f g h] by first calculating two new variables: T1 & T2. T1 & T2 combine and shift the variable group with the data and primes. After, T1 & T2 are distributed into [a b c d e f g h].
Only one word and prime are introduced at a time. The first cycle uses the 1st word and the 1st root, the second cycle uses the 2nd word and the 2nd root, and so on. At the end, each element in the variable group will have been incorporated at least 4 times with a word and a prime. In the full scale version, each variable would have been incorporated with our data and a prime at least 60 times.
We’ll stop after one iteration and create the final hash.
The final hash is created from adding the root hash to [a b c d e f g h]. If we had more than one block of 16 bits, then we would still add the root hash to [a b c d e f g h], but then do another round of calculations on [a b c d e f g h].
Blam! You have a final hash (Ta Da!)
No single step in the process is difficult, but there are a lot of steps to take. Calculating a hash by hand can be done, however, it is an extremely long and iterative process. The maximum input to SHA-256 is something on the order of 18 giga gigabytes. That’s a lot of 512 bit blocks!
But computers can do this pretty quickly. Doing small simple tasks many times is a computer’s job. When you hear something like a hash rate of 1TH/s (Terahash per second) then that’s a trillion of these full-scale calculations per second (wow!).
Now that you know how to calculate hashes, if you want to find out how they are used in cryptocurrency, check out here (future article).
Also, check out this cool hash generator. You can really see how random the outputs can get!