PBFT Byzantine agreement security analysis: not suitable for alliance chain or public chain

BTC0.0₂₀

ETH0.0₂₀

HTX0.0₂₀

SOL0.0₂₀

BNB0.0₂₀

BTC0.0₂₀

ETH0.0₂₀

HTX0.0₂₀

SOL0.0₂₀

BNB0.0₂₀

PBFT Byzantine agreement security analysis: not suitable for alliance chain or public chain

Sperax

2019-12-02 10:34

本文约4670字，阅读全文需要约19分钟

The PBFT consensus protocol cannot guarantee the safety and liveness requirements of the blockchain system

Author: Professor Wang Yongge, a famous Chinese cryptographer, a tenured professor of computer science at the University of North Carolina at Charlotte (UNC, Charlotte), a doctorate from Heidelberg University, and chief scientist of Sperax.

text

Lamport, Leslie. "The part-time parliament." ACM Transactions on Computer Systems (TOCS) 16.2 (1998): 133-169.

The design of consensus protocols has always been a very challenging subject. Turing Award winner Lamport used a group of amateur legislators on the ancient Greek island of Paxos to make laws in 1989 to describe the Paxos consensus protocol he designed for distributed computing. Lamport submitted his article to ACM TOCS. Perhaps the editor of this journal did not appreciate the importance of the article, so he never agreed to publish it. It wasn't until the Paxos consensus protocol was widely discussed in academia and widely used in industry that the journal published the article in 1998:

Lamport joked to himself that this was the second-longest wait for publication of any of his articles. So far, the Paxos consensus protocol has been used in almost all distributed systems. For example, Google's Bigtable uses the Chubby lock service system to ensure the consistency of data at each node. The Chubby lock service is based on the Paxos protocol. In addition, Microsoft, IBM, and Amazon's cloud service systems all use the Paxos protocol to provide system consistency. Roughly speaking, the Paxos protocol consists of a series of ROUNDs. ROUND starts from 0 until consensus is reached. Each ROUND consists of the following four steps:

1. The master node generates a serial number and broadcasts it to all nodes. I hope everyone will participate in the activity of this serial number

2. Each node sends the following information to the master node: the serial number of the votes he has participated in and the votes he has voted

3. After receiving most of the replies in the second step, the master node chooses a value v that will not violate SAFETY. broadcast this value v to all nodes

4. After each node receives the value v from the master node in the third step, it votes for v and broadcasts its vote to all nodes

Because the Paxos protocol is difficult to implement, Stanford researchers proposed a modular and easy-to-implement Paxos protocol in 2014, and named it the Raft protocol. The Paxos/Raft protocol works in a milder threat model. In other words, the protocol is only robust to non-Byzantine errors in asynchronous networks. In the non-Byzantine threat model, faulty nodes can only make passive mistakes (such as stopping work) and cannot launch active offensive attacks. The maximum number of non-Byzantine faulting nodes that a system with n nodes can tolerate is (n-1)/2. The Paxos/Raft protocol achieves this maximum number of fault-tolerant nodes.

Because the Paxos/Raft protocol is not robust to Byzantine errors, they cannot be used in an open network system (such as a blockchain system). Byzantine errors are actively aggressive errors, such as lying, forging messages, colluding attacks, or launching selective DoS attacks. We have mentioned in previous articles that the decentralized blockchain system is based on an open network system, so we must use the Byzantine threat model.

The most widely used Byzantine agreement in the blockchain on the market is the practical Byzantine fault-tolerant system PBFT (practical BFT) designed by Turing Award winner Barbara Liskov and her student Castro. PBFT is widely used in alliance chains (such as Hyperledger sawtooth and Hyperchain of Qulian Technology) and many public chains. PBFT can be seen as a Byzantine version of the Paxos protocol. The main difference is that PBFT adds a verification step to the Paxos protocol to prevent Byzantine errors. Before analyzing its security, we first give a formal description of its protocol.

In the PBFT protocol, we assume that there are n=3t+1 nodes P1, ..., Pn. Among them, at most t nodes are controlled by the attacker. PBFT requires all nodes to jointly maintain a state and take consistent actions. The PBFT protocol is carried out through a series of views. In each view, there is a node called the main node (leader). A PBFT system first starts with a view (v=0), and then goes to views v=1, v=2, … and so on through the view replacement protocol. Only when the system believes that the master node cannot work normally, will it start the view replacement protocol and enter the next view. We assume that all nodes know who the master node for each view is. Whenever a client submits a task to the master node of the current view, the PBFT protocol will carry out three phases of communication: sequence number assignment (pre-prepare), mutual interaction (prepare), and sequence number confirmation (commit). The serial number allocation (pre-prepare) phase assigns a serial number to each task, and the mutual interaction (prepare) and serial number confirmation (commit) phases provide a global ordering for all tasks. Suppose we are now in view v, and the primary node is Pi. Then the process of the whole agreement is as follows:

1. The client sends a task request m to activate the service operation of the master node.

2. After the master node Pi receives the task request m, it starts the three-phase protocol:

m,，SIGNATURE

a. Sequence number allocation (pre-prepare) phase: the master node selects a unique sequence number seq for task request m. The master node then broadcasts the following message to all nodes

where H is a hash function. A node Pj accepts the above message if the following conditions are met

i. The digital signature SIGNATURE is valid

ii. Pj has not yet accepted another task request with the same v, seq

iii. The serial number seq is within a reasonable range

，SIGNATURE

b. Mutual interaction (prepare) phase: If node Pj accepts the broadcast message from the master node, then Pj enters the mutual interaction (prepare) phase and broadcasts the following message to all nodesc. Serial number confirmation (commit) stage: for node Pj, an arrayis ready if and only if Pj has received at least 2t valid messages. when the array

，SIGNATURE

After it is ready for Pj, Pj broadcasts the following confirmation message to all nodes:

When a node receives 2t+1 confirmation messages, the node will execute the task contained in the task request m and send the result directly to the client.

3. The client waits for replies from different nodes. If t+1 replies are the same, the reply is the result of the operation.

Yongge Wang. Byzantine Fault Tolerance in Partially Connected Asynchronous Networks

We recently analyzed the security of PBFT in the following articles:

The analysis of this article concludes that the PBFT consensus protocol is not safe in an asynchronous network. In this article, we briefly introduce the attack method we designed on the PBFT protocol in an asynchronous network. To simplify our description, we assume that the system has n=3+1=4 nodes P1, P2, P3, P4. Among them, node P1 is controlled by the attacker. Also we assume that the primary node of view v is P1. Our attack unfolds in view v:1. In the pre-prepare stage of view v sequence number assignment, the master node P1 broadcasts the message "m,

, SIGNATURE" sent to P1, P2, P3. But not to P4.2. In the mutual interaction (prepare) phase, P1 sends the broadcast message ", SIGNATURE" sent to P1, P2, P3. But not to P4. In the mutual interaction (prepare) phase, nodes P2 and P3 will broadcast the message ", SIGNATURE and, SIGNATURE" is sent to all nodes. Of course, if possible, the attacker may launch a DoS attack to prevent node P4 from receiving the broadcast messages from nodes P2 and P3 (this DoS attack is not very important to our attack). At this point, the array

It is ready for nodes P1,P2,P3. Because P4 has received at most two interactive messages, and we need at least 2+1=3 messages to prepare an array, so for P4, the array is not ready.3. In the serial number confirmation (commit) phase, P1 broadcasts the message ", SIGNATURE" is sent to P1, P2. But not to P4. In the serial number confirmation (commit) stage, nodes P2 and P3 will broadcast the message ", SIGNATURE and

, SIGNATURE" is sent to all nodes. Of course, if possible, the attacker may launch a DoS attack to prevent node P4 from receiving the broadcast messages from nodes P2 and P3 (this DoS attack is not very important to our attack). At this point, nodes P1 and P2 have received 3 confirmation messages for task m. Nodes P3 and P4 receive at most 2 confirmation messages for task m. So node P2 will execute the task contained in the task request m and send the result directly to the client. But P1,P3, P4 will not perform this task. So the client doesn't receive enough replies.

After carrying out the above attack, node P1 will no longer reply any message of any view v. So the system will start the view replacement protocol to enter the next view v+1. After entering view v +1, the internal data states of honest nodes P2, P3, P4 are different. So the system goes into an uncoordinated state.

，SIGNATURE

In the PBFT protocol, in order to solve the problem that some nodes may not receive certain messages (such as our above attack scenario), the PBFT protocol designed the CHECKPOINT status update process. In particular, after every 100 tasks are executed, each node Pj will broadcast its current status message to all nodes:

If a node Pi receives 2t+1 state update messages as above, and its state is state, then node Pi will replace its current state with the state state in the above message. In our above attack, if the dishonest node P1 does not publish the state update message, then the state update message issued by P2 will be different from the state update messages issued by P3 and P4. Because we need at least 2+1=3 identical status update messages to update the status of a node, the status of P2 cannot be updated to the status of P3 and P4. So the system will always be in an uncoordinated state.

In the later view, the dishonest node P1 can cooperate with the honest nodes P3 and P4 to jointly execute another task request from the client. Therefore, the state of each node will enter an unrecoverable uncoordinated state.

In our previous article, we mentioned that in Internet-based blockchain technology, DoS attacks are easy to launch. Since the Internet is an asynchronous network, we use the following model to describe its network communication:

There is a Global Stabilization Time (GST), before which any message may be lost (such as a DoS attack), or be reordered. After GST, the network becomes a synchronous network. But when GST will start, no one knows.