Whale Research Institute丨Hashgraph Technology Analysis
鲸准研究院
2018-07-08 07:36
本文约7301字,阅读全文需要约29分钟
Hashgraph is a brand-new distributed ledger consensus mechanism technology and data structure, more like an underlying block layer.


author:




————————————————————

author:

Node Capital Research Center Lin Jieyin Lang Hanwei

Whale Research Institute Tan Ying Wang Fan Chen Hongyi

first level title

.01.Overview

1.1 The concept of blockchain

The blockchain in the traditional sense is a chain that links a series of blocks in a linear manner, and these blocks record transactions that occurred within a period of time. Miners compete for the accounting rights of transactions within this time period through various mechanisms. For a transaction, it needs to be lucky enough or pay enough handling fees to be selected by the miners. For example, in the Bitcoin network, only one block is produced every 10 minutes, with an average of 7 transactions per second. Although Ethereum has greatly accelerated the block production time, it still takes more than ten seconds to confirm a block. Since the confirmation of a transaction needs to wait more than ten seconds or even ten minutes, the efficiency is very low, so blockchain 1.0 and 2.0 projects still have a long way to go for large-scale commercial use.

1.2 The concept of DAG

If a directed graph starts from any vertex and cannot return to the point through several edges, then the graph is a directed acyclic graph (DAG for short). Directed acyclic graph breaks the concept of "block", each transaction in it skips the step of waiting to be packaged into a block, and is directly included in the chain in units of a single transaction.

However, the DAG network faces the problem of controlling the width of the network. If every new transaction is linked to an older transaction node in the network, the DAG network will suddenly become wider from this old node, and the efficiency will become lower. . Therefore, the ideal state is that each new transaction is evenly connected to the new and old nodes of the chain, and the network is controlled within a certain width. DAG networks including IOTA solve the width problem in this way.

Problems with existing DAG projects:

IOTA:

1. The MIT report pointed out that IOTA uses the hash algorithm curl developed by itself, but the hash value of the curl algorithm is very prone to collisions, so digital signatures can be forged.

2. Because the consensus is determined by the transactions of the entire network, in theory, if someone can generate 1/3 of the transaction volume, he can turn invalid transactions into valid transactions. On the other hand, since IOTA has no transaction fees, there is no incentive for miners. IOTA is facing the possibility of denial of service attacks and spam attacks. Just like a community that does not charge property fees, it is difficult to eliminate criminals by self-government.

3. IOTA introduces the closed-source centralized component Coordinator to check transactions on the entire network (such as double spending). How to effectively remove the Coordinator and establish a decentralized "Coordinator group" with a benign incentive mechanism has not yet been given by IOTA. solution.

Byteball:

Due to the relationship between the main chain algorithm and the release frequency of witnesses, the time of transaction confirmation is uncertain; because Byteball stores data based on a relational database, the SQL language is too tightly coupled to the algorithm logic, which limits Byteball’s current expansion capabilities and speed.

NANO:

Without adequate testing and lack of peer review, consensus algorithms risk being seriously flawed. For example, what happens if there is not enough quorum voting to resolve network conflicts? If parts of the NANO network are separated for an extended period of time, what happens when the separated network rejoins? Will rejoining networks be paralyzed during the inevitable voting process? These issues are yet to be tested and verified.

1.3 The concept of Hashgraph

Hashgraph is a data structure and consensus algorithm based on the DAG network, but Hashgraph has its own way of controlling the width, and each point can have two parent nodes. In the Hashgraph network, only the nodes that have been admitted have the right to initiate an event (Event). The event is the container of the transaction data. All the work of initiating new events must be completed by these nodes. Through the non-chain structure, there is no need for competition. Blocks can be generated synchronously to achieve large-scale and low-cost consensus, greatly improving work efficiency, and achieving true "Blockless" while controlling bandwidth. It is said to be able to achieve more than 250,000 TPS. It is a low-transaction fee, decentralized, and non-mining Internet underlying trust network.
The data structure diagram of Hashgraph is as follows. Alice, Bob, Carol, Dave, and Ed are five nodes with the power to initiate events respectively. Each circle is an event, which is created by the node when receiving gossip. The closer to the bottom, the more Events initiated earlier, the closer to the top are newer events.

Hashgraph pioneered the asynchronous BFT consensus in the public chain environment. A major problem with traditional BFT is that the message complexity is too high, which consumes a lot of network bandwidth of the system and cannot cope well with dynamic networks. Here Hashgraph introduces the traditional Gossip Protocol, and adds unique innovations, plus a virtual voting mechanism, so that it will not cause a sudden large-scale messaging storm when consensus is required.

Comparison of Blockchain and Hashgraph:

first level title



. 02 .Hashgraph technology

The consensus mechanism of Hashgrapgh includes two parts, Gossip about Gossip and Virtual Voting. The following will explain the workflow of the two and the workflow of the overall consensus mechanism, and explain the advantages and problems of Hashgraph.

2.1 Gossip about Gossip

The core communication protocol of Hashgrapgh is "Gossip about Gossip", which is inspired by office gossip. The gossip here refers to a piece of information that I know but the other person does not know. Within a certain period of time, everyone will know the gossip information.

In Hashgraph, each node is propagating signed new transactions and transaction information received from neighboring nodes. When a node receives data containing new transaction information, it will combine and possibly add the transactions it knows to become a new event (Event, similar to the concept of a block, is a data containing two hash pointers structure, and can include 0 or several transaction information), and this event contains two hashes, one pointing to the last latest event of the node, and the other pointing to the latest event received by the node from another node, The entire event is then timestamped and signed, and the cycle continues until all nodes have the same information.

As shown in the figure below, when Bob randomly finds Alice's gossip, he will tell Alice everything he currently knows. Then Alice will create a new event here (red dot). In addition to adding a new transaction, this new event will also add two hash values ​​pointing to the parent event, one pointing to her latest event (dark blue), and one Point to the most recent event (sky blue) when Bob chatted with himself.

In essence, the gossip algorithm is a fault-tolerant algorithm with redundancy. Furthermore, the gossip algorithm is an eventual consistency algorithm or a means of providing a consensus algorithm. Although it cannot be guaranteed that the state of all nodes is consistent at a certain moment, it can be guaranteed that at a certain moment in the end, all nodes agree on all the history before a certain point in time.

The content of gossip between nodes in Hashgraph also includes the historical records of gossip between nodes, so each node can maintain a hash graph through gossip, so that when the node calculates the consensus vote, it can initiate a virtual vote, that is, calculate How other nodes will vote in a given hash graph, so that there is no need to do a lot of two-way synchronous communication on the network to perform real voting. In the words of the inventor of Hashgraph: "Hashgraph has all the advantages of the voting algorithm, but avoids its biggest flaw."

Due to the rapid convergence property of the Gossip protocol, each new piece of information can reach each node in a faster manner, so that each node maintains the communication history of all nodes with other nodes.

2.2 Virtual Voting

Above we saw how Hashgraph communicates between nodes. After executing the gossip algorithm, all nodes are full nodes and store a complete network history. Large-scale message communication is not required when a consensus on a certain proposal is required. , each node can independently execute the virtual voting mechanism (Virtual Voting), and all nodes will definitely come to the same consensus result. Below we can see the detailed term definition of the virtual voting mechanism and the demonstration example of the virtual voting process, quoted from the article "Hashgraph - perhaps the best consensus protocol at present".

Definition of terms:

event

In the above gossip algorithm, we have come into contact with this concept, similar to the block in Bitcoin, the event is a data structure containing two hash pointers, and can include 0 or several transaction information, the node creates the event At the same time, it will be time-stamped and the entire event will be digitally signed.

absolute majority

The number of more than 2/3 nodes also has this concept in many DPoS algorithms.

visible

When event B can find event A along the hash pointer, then event B can see event A.

Strongly visible (strongly seeing)

When event B can find all the paths of event A across the absolute majority of nodes, then event B is strongly visible to event A. It is mentioned in the white paper that mathematical proof can guarantee that two strongly visible nodes can obtain consistent results in virtual voting.

witness

The first event created by each node in each round is the witness event, which is the ancestor event of the round, and the node may not have a witness event in a certain round.

famous witness

If the witness of round R can be seen by an absolute majority of witnesses of round R+1, it is a well-known witness. The specific calculation method is detailed in the following text.

round created

The creation round of an event is R or R+1, where R is the maximum round of the parent node of the event. If and only if the event can be strongly seen by the absolute majority of R round witnesses, then the creation round of the event is R+1.

round received

If a common event is visible to all well-known witnesses in the R round (creation round), the acceptance round of the event is the R round. If a common event is not visible to all the well-known witnesses in the R round, its acceptance round The time must be later than the R round.

An example of a virtual voting process:

The figure below has divided the creation rounds. The DAG graph in the figure grows from bottom to top. How to divide the creation rounds will be discussed in detail later. After each node is synchronized to the new event, it can immediately start calculating the creation round Second-rate.

Mark each round of witness events according to the definition of witnesses, as follows:

For each witness, we need to determine whether it is a well-known witness. Let’s take the example of determining whether B2 event is a well-known witness. According to the definition of a well-known witness, we need to determine whether A3, B3, C3 and D3 events can see B2. In fact, it is an election process, and each witness will vote on B2 to determine whether B2 is famous.

A3 event is visible to B2, and the visible path is as follows yellow line, we can say that B2 is the ancestor event of A3, A3 is the son event or derived event of B2, A3 is visible to B2, so A3 votes YES.

Similarly for the other 3 witnesses, all witnesses voted YES after voting, so we predict that the B2 event will be a well-known witness, but it should be noted that the election process is not over yet, and there is still a stage of counting votes. The votes must be completed by the next round of witnesses, so B4 and D4 will be counted. Although there are no A4 and C4 in this picture, they will definitely appear and will also participate in the counting over time.

In the vote counting stage, the R+2 round witnesses will collect the voting results from their own strongly visible R+1 witnesses. Once the number of votes counted by a certain voting result exceeds the absolute majority, the result is considered valid, that is, a consensus is reached. According to mathematical theory, if any R+2 witnesses make a decision on the voting result, then this result is the conclusion of the entire network. If this round of witnesses cannot make a decision, the next round of witnesses will count the votes decision until a definite conclusion is reached. Let’s look at an example specifically. There are three visible paths from B4 to A3 and they span three nodes. Therefore, B4 is strongly visible to the A3 event, that is, the voting result collected by B4 from A3 is YES.

In the same way, B4 can strongly see B3, C3 and D3 events.

By summing up, the B4 incident collected 4 YES votes, obviously we can draw a conclusion: B2 is a well-known witness! We will mark these well-known witnesses in green in the figure. Then we continue to judge the popularity of the C2 event. Since the next round of witness voting results of C2 is 1YES, 3NO, B4 will obviously judge that it is not a well-known witness after counting the votes. We will mark C2 as blue, and the white paper has Mathematical verification guarantees that all other witnesses have made the same decision.

If a decision cannot be made in the next round (such as a 2:2 voting result), it will continue to the next round. According to the mathematical theorem, as long as we add a random round (coin round) every ten rounds, the election process will eventually be certain. It will end (converge with probability 1, in layman's terms it is almost certain to converge, which is a concept in probability theory). In the random round, the witnesses who have collected the absolute majority of results only vote without making a decision, while other witnesses vote randomly based on the median of the digital signature. We continue with the election of prominent witnesses, with the following results:

Once a round has identified all known witnesses, acceptance rounds and consensus timestamps can be determined for other common events in the round. We can see that the black event can be seen by all known witnesses in the second round, so its acceptance round is 2.

Now we start to determine the consensus timestamp of the black event for subsequent determination of the consensus order, looking for the earliest event X of node A, which is both the ancestor of A2 and the son of the black event, and similarly looking for Y of node B and Z of node D. Then sort the timestamps of the XYZ events in order and take the median as the consensus timestamp of the black nodes. We then proceed to determine the acceptance rounds for other nodes.

Now that we have identified 10 events whose acceptance round is 2, we will sort them in the order recognized by the entire network, that is, the consensus order, and sort them according to the following priorities:

accept round

consensus timestamp

Sort by the XOR result of the event signature and a random number, which is obtained through the XOR operation of the digital signatures of all well-known witnesses in this round

2.3 Summary of Consensus Mechanism

Hashgraph is a consensus mechanism consisting of a gossip protocol and a virtual voting mechanism. Generally speaking, it can be summarized as the following steps:

1. Each node is trying to randomly find other nodes, and pass the information it knows to the other party through the gossip protocol;

2. Each node is also receiving information from other nodes through the gossip protocol. When receiving information, the node needs to perform a series of calculations, including:

a. Accept and process received gossip information

b. Create a new event and point to your own last event and the last event of the gossip source node at the same time

c. Calculate its creation round for all known events and determine whether the event is a witness event in this round

d. Vote for all known witness events to calculate whether they are well-known witnesses

e. Determining acceptance rounds for all events through well-known witnesses

f. Through the acceptance rounds and consensus timestamps of events, conduct virtual voting to determine the consensus order

For the entire consensus algorithm, a single node needs to save the entire network data.

2.4 Advantages of Hashgraph

Fairness: maintaining the actual order of transactions

With consistent timestamps, every event and every transaction within an event has an order.

There is no such role as a miner.

Safety: Asynchronous Byzantine Fault Tolerance

fast

fast

According to the test data on the official website, it can reach an astonishing 250,000 TPS.

2.5 Hashgraph current problems

It is currently a private chain, and the reference value of throughput is doubtful

At present, Hashgraph is a private chain, and its "fast running speed" can only be compared with other private chains, such as Hyperledger (700 transactions/second) and Red Belly (400,000 transactions/second). It is very unfair to compare public chains such as Ethereum and Ethereum, because the current Hashgraph does not need to set up a mechanism to prevent malicious node attacks. In addition, whether the gossip algorithm is applicable to a large-scale public chain environment is still worth exploring.

Can it withstand malicious attacks

Sybil attack, that is, attackers destroy the reputation system of peer-to-peer networks by creating a large number of fake identities, and use them to gain disproportionately large influence. At present, Hashgraph is a private chain, and the identities of all nodes are known. This kind of access control makes Hashgraph at this stage not need to consider the danger of Sybil attacks. But if Hashgraph intends to develop towards the public chain in the future, whether it can resist Sybil attacks will be a question that Hashgraph must think about.

Vote verification may take a long time

Although Hashgraph's algorithm is easy to create events, the voting verification process after each Round may be very long. If an absolute majority of more than 2/3 has not been achieved, there may be many rounds of voting to determine who has recorded transactions valid.

Fairness issues when external conditions are different: how to determine the order of transactions?

Fairness is explained in the Hashgraph white paper as follows:

Assume that there are two nodes, A and B, and A sends a transaction request before B. If the time stamp of A’s transaction is earlier than B’s transaction under the judgment of the consensus mechanism, we say that the system is fair. If A and B have transactions at the same time, and the two transactions are uploaded to the network and propagated almost at the same time, a fork may occur at this time, but we also say that the system is fair. Most consensus mechanisms are able to achieve fairness in both cases.

But this explanation is based on the assumption that nodes A and B face the same external network situation. But let's consider a situation like this:

If the bandwidth of A is 5M/s, and the bandwidth of B is 10M/s, A does upload its own transaction information in the network earlier than B, but due to the bandwidth limitation, the propagation speed of A's message will be slower than that of B, In this way, it is possible that most people will receive B's message first when they finally vote. It's like in school, B has more friends and has more influence than A, so when discussing gossip, B can tell more people the gossip information he wants to spread faster. Even though A may be the one who spread the gossip first, most people will hear the version from B's mouth first due to the limitation of influence.

When the external conditions of the nodes are different, whether the voting can also reflect the real transaction order is not clearly stated at present, so there are still doubts about fairness.

Code is not open source

first level title

.03.Summary

first level title

.04. References and Citations

  • Hedera Hashgraph white paper;

  • 20180326 Consensus sorting and Hashgraph brief comment Author Xie Junyi code farmer learning blockchain;

  • 20180403 Can the god-level project Hashgraph really become the terminator of the blockchain? Author Casey Maoyan Financial Focus;

  • 20180413 Hashgraph —— Perhaps the most outstanding consensus protocol at present Author Eric Sun BlockGeeks;

  • 20180417 Hashgraph —— An Excellent Consensus Protocol That May Surpass Blockchain Author XC takes the lead in Bijie;

  • 20180424 An article to understand the status quo and trends of DAG technology|chain catcher author Li Qiang chain catcher;

  • 20180507 How to understand Hashgraph in ten minutes Author InterValue InterValue.

  • For more discussions on this article, please leave a message in the background

    【Reprint Notice】

    【Reprint Notice】

    1. This report is an original work of Jingzhun (ID: rong36kr), a professional data research and analysis organization [Jingzhun Research Institute], which is protected by the "Copyright Law" and enjoys the right of compilation and annotation according to law;

    3. Commercial reprinting and secondary editing and reprinting are prohibited.

    3. Commercial reprinting and secondary editing and reprinting are prohibited.



鲸准研究院
作者文库