The hashing function
Hashing is a function that takes an input of any length and turns it into a fixed length output. So, to make this clearer, we can look at the following code example:
>>> import hashlib
>>> hashlib.sha256(b"hello").hexdigest()
'2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824'
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
>>> hashlib.sha256(b"hellohellohellohello").hexdigest()
'25b0b104a66b6a2ad14f899d190b043e45442d29a3c4ce71da2547e37adc68a9'
As you can see, the length of the input can be 1, 5, or even 20 characters, but the output will always be the length of 64 hexadecimal numeric characters. The output looks scrambled and it appears that there is no apparent link between the input and the output. However, if you give the same input, it will give the same output every time:
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
If you change the input by even just a character, the output would be totally different:
>>> hashlib.sha256(b"hello1").hexdigest()
'91e9240f415223982edc345532630710e94a7f52cd5f48f5ee1afc555078f0ab'
>>> hashlib.sha256(b"hello2").hexdigest()
'87298cc2f31fba73181ea2a9e6ef10dce21ed95e98bdac9c4e1504ea16f486e4'
Now that the output has a fixed length, which is 64 in this case, of course there will be two different inputs that have the same output.
Not all hashing functions are safe though. SHA-1 already died in 2017. This means that people can find two different long strings that have the same output. In this example, we will use SHA-256.
The output of the hashing function can be used as a digital signature. Imagine you have a string with a length of 10 million (say you are writing a novel), and to make sure this novel is not tampered with, you tell all your potential readers that they have to count the 10 million characters in order to ensure that the novel isn't be corrupted. Nobody would do that. But with hashing, you can publish the output validation with only 64 characters (through Twitter, for example) and your potential readers can hash the novel that they buy/download and compare them to make sure that their novel is legit.
So, we add the parent's hash in the block class. This way, we keep the digital signature of the parent's block in our block. This means that if we are ever naughty and change the content of any block, the parent's hash in any child's block will be invalid, and you would get caught red-handed.
But can't you change the parent's hash of the children's block if you want to alter the content of any block? You can, obviously. However, the process of altering the content becomes more difficult. You have to have two steps. Now, imagine you have 10 blocks and you want to change the content in the first block:
- In this case, you have to change the parent's hash in its immediate child's block. But, alas, there are unseen ramifications with this. Technically speaking, the parent's hash in its immediate child is a part of the content in that block. That would mean that the parent's hash in its child (the grandchild of the first block) would be invalid.
- Now, you have to change that grandchild's parent's hash, but this affects the subsequent block, and so on. Now, you have to change all blocks' parent's hashes. For this, ten steps need to be taken. Using a parent's hash makes tampering much more difficult.