Detailed explanation of Tendermint consensus algorithm
BFTF技术社区联盟
2018-10-18 07:36
本文约5584字,阅读全文需要约22分钟
What is the core process of the Tendermint algorithm? What locking mechanism is implied by Tendermint?

Original Author: Soul Machine

Original Author: Soul Machine

Tendermint is an open source project of Tendermint, which is a variant of the pBFT algorithm. The relationship between Tendermint and pBFT is similar to the relationship between Raft and Paxos. Tendermint is a simplified version of pBFT.

Algorithm core process

The Tendermint core algorithm flow is shown in the figure below:

The Tendermint algorithm first randomly selects some nodes as Validators (how to select a verification node? See "How to select a verification node" below), and then selects one of the validators as a proposer node. The proposer node starts to monitor and collect all transactions in the whole network. After a few minutes, it assembles a new block and broadcasts it to the whole network. This is the proposal block. After receiving this proposal block, all validator nodes in the whole network start to read all the transactions in this block and verify them one by one. If there is no problem, they will send a pre-vote voting message, expressing their agreement to this block, and cast an affirmative vote. If it is found that there is an illegal transaction in the block, it will cast a negative vote, and these voting messages will be broadcast to all validator nodes, so each validator node will not only send out a voting message, but also collect other people’s voting messages. When the number of agreed votes exceeds 2/3, a pre-commit voting message is sent, which is the second stage of voting, and each node is also monitoring and collecting pre-commit voting messages. When the number of pre-commit votes collected by a validator node exceeds 2/3, it means that the block has been approved by the majority of people, and you can safely write this block into the local blockchain and append it to the end to complete commit. At the same time, the block height increases by 1, and the proposer node index also increases by 1, entering the next round (round), and starting to propose new blocks.

The above description is the normal process 90% of the time, that is, the big cycle indicated by the blue arrow. Next, let’s talk about the small loop indicated by the red arrow in the inner circle. In most cases, a block only needs one round to complete the confirmation and reach a consensus. However, sometimes the network status is not good and timeouts often occur. At this time, it will enter the red process in the inner circle. Since only one proposer has the right to produce blocks in each round, what should we do if the proposer hangs up or the network is not good and timeouts? It is impossible for everyone to wait for this proposer forever. The algorithm is obviously not designed so stupidly, and there is an obvious single point of failure. In the propose phase of the proposer node, all validator nodes will start a timer and set the timeout period to T (the value of T is dynamically calculated according to the network conditions). If no new message from the proposer node is received within this time Block, it is considered that the proposer node is hung up, all validator nodes will not continue to wait, and will immediately generate an empty block with a special structure on the local machine, pretending that the empty block is received from the proposer node, so, no matter what, Within time T, a proposal block is received, either a normal block or an empty block. Then proceed to pre-vote voting and pre-commit voting on this block. If the proposer hangs up, most of the validators will see an empty block, so the empty block will get a majority of votes and enter the commit stage. When committing an empty block, it will not actually write an empty block to the blockchain, but will not write anything, and the block height will not increase automatically, and will remain unchanged. This is equivalent to doing nothing. This round ( round) is idling. This round is over, and when the next round starts, a validator will be replaced as a proposer, so that the currently suspended proposer will not be stuck in the entire network.

Network Communication Complexity

Tendermint's network communication complexity is O(n2), because in the two voting stages, each validator node needs to collect more than 2/3 voting messages, and the number of data packets sent by the network is n2, so the network concentric complexity is O (n2). In the Proposal phase, the new block only needs to be received by all validators, and the network communication complexity is O(n). Overall, it's O(n2). The complexity of network communication is O(n2), and it is destined that the number of nodes in Tendermint will not be too many. If there are too many, network communication will become a bottleneck. Therefore, Tendermint is suitable for private and alliance chains. This is also specified on page 17 of the paper One point: Tendermint is that solution, optimized for consortia, or inter-organizational logic However, if it cooperates with a good sharding mechanism, it can still be used for public chains. That's another story.

network hypothesis

Tendermint's requirement for the network is that the network needs to be semi-synchronous. Tendermint has a timeout mechanism in the Propose phase, but this timeout period is not a constant, it changes dynamically, so at this stage, the network is required to be semi-synchronous. In the pre-vote and pre-commit voting stages, there are no requirements for the network, that is, the network is asynchronous. Therefore, Tendermint's network requirements are semi-synchronous.

Since the network is asynchronous during the pre-vote and pre-commit voting stages, if more than 2/3 of the votes are not collected, all validator nodes will wait indefinitely, so the entire system will be stuck. Tendermint has compromised Liveness in exchange for stronger Finality.

  • For example, if the proposer node broadcasts a new block blockX in a certain round, and a validator A node does not receive the new block on time, then the A will construct an empty block on the local machine, as if it received from the proposer Arrived, send a pre-vote nil voting message and then enter the pre-vote loop, and start a timeout timer, then enter the red inner circle loop, A starts to monitor the network and collect voting information,

  • If the number of votes collected, whether for empty blocks or blockX, does not exceed 2/3 within the specified time, wait indefinitely until the total number of votes exceeds 2/3

After collecting more than 2/3 of the total number of votes, if the number of votes for the empty block exceeds 2/3, then issue a pre-vote nil to vote for the empty block, and still stay in the red inner circle; if the number of votes for blockX exceeds 2/3 3. Send a pre-vote to blockX and switch to the blue outer circle; if the number of votes for the empty block and blockX does not exceed 2/3, then send a pre-vote nil message to vote for the empty block and enter the pre-commit stage, still in the red inner circle.


Once A sends a pre-commit nil voting message, A still stays in the red inner circle, and the pre-commit process is similar to the above. All in all, the process in the red inner circle needs to assume that the network is semi-synchronous.

Finality and Liveness

Tendermint's Finality is Deterministic, while Bitcoin's Finality is Probabilistic. Tendermint has stronger Finality than Bitcoin.

But in terms of Liveness, for example, when the network splits, Tendermint may theoretically be stuck, and no new transactions can be written, while Bitcoin can be split into two forks, each working independently, and new transactions can continue .

lock mechanism

  1. In the above figure, there is actually a lock mechanism implicitly, which is not shown in the figure. For example, there are four validator nodes, A, B, C, D, in a certain R round,

  2. In the propose phase, the proposer node broadcasts a new block blockX

  3. A did not receive the new block within the timeout period, broadcast pre-vote nil to the outside, B, C, and D all received it, and broadcast the pre-vote to blockX

  4. Now four nodes have entered the pre-commit stage, A is in the red inner circle, B, C, D are in the blue outer circle

  5. Assuming that A has a poor network and has not received more than 2/3 votes for blockX within the specified time, it can only send a pre-commit nil vote message to vote for an empty block

  6. D received the pre-vote messages of B and C, plus his own, it exceeded 2/3, so D committed blockX in the local blockchain

  7. There is a problem with the network of B and C, and D is unable to receive the pre-commit message. This is because B and C can only see 2 votes for blockX, and 1 vote for an empty block, which is less than 2/3, so B and C C can only commit empty blocks, the height remains unchanged, enters the R+1 round, A can only see 2 votes for blockX, and one vote for the empty block, and can only commit empty blocks, the height remains the same , into the R+1 round


In the R+1 round, because a new love proposer proposed a new block blockY, A, B, and C may reach a consensus and submit blockY, so at the same height, there are two blockX and blockY block, resulting in a fork.

Tendermint has added a lock mechanism, specifically, in step 7, even if the proposer produces a new block blockY, A, B, and C can only be locked on their pre-commit blocks in step 6, that is, A is in step 6 If you vote for an empty block in step 1, you can only continue to vote for an empty block in round R+1, and B votes for blockX in step 6, then in the new round, you can only vote for blockX forever, and C is similar. In this way, in the R+1 round, there will be 1 vote for the empty block and 2 votes for blockX, and finally reach a consensus on blockX, A, B, and C will all commit blockX, which is consistent with D, and there is no conflict.

Tendermint vs. pBFT

  • Tendermint and pBFT look very similar, for example:

  • All belong to the BFT type algorithm, which tolerates no more than 1/3 of malicious nodes at most

  • They are all three-stage submissions. Tendermint’s propose->pre-vote->pre-commit three stages correspond to the three stages of pBFT, pre-prepare, prepare, and commit.

When all are timed out, replace the proposer/primary

Not enough Tendermint has two simplifications compared to pBFT.

Tendermint does not have the View Change stage of pBFT. Tendermint cleverly integrates the timeout situation with the normal situation into a unified form. They are all in the three stages of propose->pre-vote->pre-commit, and only new blocks are created when timeout occurs. is a special empty block. Switching the proposer is triggered by submitting an empty commit block, while pBFT has a separate view change process to trigger primary rotation.

The state of each replica includes the state of the service, a message log containing messages the replica has accepted, and an integer denoting the replica’s current view. 

In addition to eliminating View Change, Tendermint is also simplified in another place. All Tendermint information is stored in the blockchain. Because pBFT was proposed in 1999, there was no such thing as blockchain at that time (blockchain was only available after the emergence of Bitcoin in 2009), so although all nodes of pBFT have consistent data, the data is stored in a decentralized manner. The data of each node of pBFT includes:

Blockchain is a distributed database. For example, before DBMS databases such as MySQL appeared, people wrote data into files and stored them on hard disks, inventing various strange file formats and organization methods. With MySQL, managing data is much more convenient. Similarly, Tendermint stores all data in the blockchain, pBFT does not have a distributed database such as blockchain, and all full nodes need to manage data on the hard disk by themselves. For example, in order to compress message logs, discard old messages, and save hard disk space, checkpoint is introduced. concept, there are a lot of cumbersome steps in data management alone.

The relationship between Tendermint and pBFT is similar to the relationship between Raft and Paxos. Tendermint is a simplified version of pBFT, which is a simplified version of pBFT for the blockchain scenario.

How to choose a validator
First, in the genesis block, a set of validator nodes can be statically set. Second, when the chain starts, a ValidatorUpdate message can be sent to update the validator list. See Validator Updates

There is not much talk about this place in the document, and it seems to be relatively primitive. There is no random sampling algorithm similar to the cryptographic lottery of Algorand. But this place should be able to expand at any time.

How to change Proposer in turn
After a group of validator nodes is selected, all validator nodes in the entire network will save a copy, for example, in a circular array.

Generally, a block can be generated in only one round in most cases. When the network is not good, it may take multiple rounds to generate a block. In any case, each round will have a new validator as the proposer, and the rotation rule is simply to increase in sequence. In the first round, the 0th validator in the array will be selected as the proposer, and in the second round, the 1st validator will be selected, and so on. , after reaching the last one, it is reset to 0, which loops infinitely.
Then start the consensus algorithm. In the 0th round, the validator whose position is 0 is selected as the proposer node. In the 1st round, the validator whose position is 1 is selected. In the second round, the validator whose position is 2 is selected as the proposer. After reaching the last one, it is reset to 0. Infinite loop.

This round-robin strategy can effectively skip the timed-out proposer node. If a proposer node hangs up or the network is poor, most nodes cannot receive a new block on time, so after the timeout, each verification node will construct an empty block locally and broadcast the voting message out, after pre- After two rounds of voting, vote and pre-commit, finally commit an empty block, which is equivalent to doing nothing, with the same height, and starting a new round. Since each new round starts, it will be replaced by the next proposer in order, so the proposed proposer node that hangs up will be skipped naturally, and the algorithm can continue automatically.


The round-robin strategy is too simple, and it is easy for the bad guys to predict who the next validator will be, so they can plan in advance and launch DDoS attacks or other attacks on the validator. What should we do? Tendermint's solution is to place all the validator nodes behind the Sentry Node, without exposing the IP address to the outside world.

To be continued TODO


References

  1. Tendermint: Byzantine Fault Tolerance in the Age of Blockchains

  2. Tendermint: Consensus without mining

BFTF技术社区联盟
作者文库