To understand blockchain deeper, let’s first talk about the concept of digital signature or hash.
A digital signature is basically a function that takes a string as input and returns an alphanumeric string of fixed size. The output string is known as the digital signature or hash of the input message. It is important to note that the function by which we obtain the digital signature is “irreversible” in the sense that it can compute the hash given the input string. However, given a hash, it is almost impossible to compute the input string. In addition, it is also almost impossible to find 2 values that have the same hash.
Here, essentially, we are trying to say the following:
It is easy to compute hash1 from input1 and hash2 from input2.
It is virtually impossible to compute input1 given the value of hash1. Similarly for input2 and hash2.
It is almost impossible to find different input1 and input2 such that hash1 = hash2.
Such hashing functions are carefully designed by cryptographers after years of research. Most programming languages have a built-in library function to calculate the hash of a certain input string.
Why are we talking about a hash function?
Well, blockchain as a concept relies heavily on hashing. The idea is that in a blockchain, we have an ordered chain of blocks such that each block contains the following information:
The hash of the previous block.
The list of operations.
Hash itself.
Let’s take an example. Consider the following simple block: [0, “X paid Y $100”, 91b452].
Here, since this is the first block of the Blockchain, the hash of the previous block is 0. The transaction list contains only 1 transaction – X paid Y $100. The hash itself is calculated as follows:
hash_itself = Hash (Transaction list, Previous block hash)
Essentially, we combine the transaction list and the hash of the previous block as a single input string and pass it to the hashing function to get the value of hash_itself.
Such blocks, where the hash of the previous block is 0, are called Generis Blocks. A Genesis block is essentially the very first block in the Blockchain.
Now suppose we want to add some more blocks to this blockchain. Let block1 = [91b452, “Y paid $20 to Z, X paid $10 to P”, 8ab32k].
Here 91b452 is nothing but the hash of the previous block (Genesis block). There are 2 transactions:
Y paid $20 to Z
X paid $10 to P
Finally, we have the hash_itself value, which is basically Hash(“Y paid $20 to Z, X paid $10 to P”, 91b452). This turns out to be 8ab32k.
Pictorially, our blockchain looks like this:
What’s so special about this “data structure”?
Well, the idea is that if someone corrupts the blockchain, say by changing a transaction in the Genesis block – changing “X paid $100 to Y” to “Y paid $100 to X”, it will change the hash value of 91b452. As a result, there will be a mismatch of this hash value in block 1 (remember that the first value of each block is the hash value of its parent block). As a result, the chain becomes invalid.