Fragmentation technology - a sharp edge to improve blockchain TPS
哈希未来
2018-08-06 07:48
本文约3702字,阅读全文需要约15分钟
It needs to be tested and verified in a large-scale network for a long time, combined with a rigorous theoretical solution proof, to be convincing.

Summary of the report:

Summary of the report:

Fragmentation is an expansion technology derived from the database, which can be used to improve the scalability of the blockchain system. The basic idea is to divide the nodes in the network into different fragments, and each fragment can process different transactions in parallel, so that they can process each other in parallel. Transactions that have not established a connection between them to increase network concurrency. It is characterized by an increase in network throughput as the number of nodes increases. The core difficulty lies in the determination of the key characteristic values ​​of data shards and the inconsistency caused by the delay of metadata communication between shards. Frequent cross-shard communication will greatly reduce the performance of the blockchain network. Since the data in each shard is updated separately, it is necessary to ensure the successful update of information when designing the application logic, and at the same time, it is necessary to reserve a certain amount of robustness to deal with possible inconsistencies in the process of achieving final consistency.

The sharding in the blockchain is divided into transaction sharding, network sharding and state sharding according to the object. It is worth noting that the network sharding technology is used in the blockchain, that is, the miners are divided into several sub-networks. Responsible for verifying the transactions on the shard, it is necessary to ensure that the number of malicious nodes is small enough, and therefore, care must be taken to ensure randomness in the rules for assigning miners. In the application of sharding technology in the blockchain, the problem that needs to be considered is the defense against various attacks such as sybil attacks, DDOS attacks, and double-spend attacks. It is necessary to ensure the total number of nodes in each shard while weighing efficiency There are enough and honest nodes account for the majority, and sharding technology has extremely high security requirements. At the same time, the number of nodes in the blockchain system is more than that in traditional databases, and it faces bandwidth limitations. It needs to be fully considered Inconsistencies caused by delays lead to performance and security issues, so there are few related projects that have landed. It needs to be tested and verified in a large-scale network for a long time, combined with a rigorous theoretical solution proof, to be convincing.

Fragmentation technology in the traditional concept is to divide the database into multiple fragments and place them on different servers. In modern cloud services, data is often hosted at different sites and partitioned. Reasons for this approach include load balancing across multiple computers to improve scalability, storage of data across multiple sites to improve availability, and more. The blockchain sharding technology is an expansion technology based on the concept of database sharding.

No matter in the field of blockchain or database, the first step in sharding is to extract the key characteristic values ​​of the data, and divide the key characteristic values ​​into different fragments for processing according to certain rules. The selection of key feature values ​​is very important, it is related to the guarantee of uniqueness of data representation and the effect of fragmentation. Regarding the selection method of eigenvalues, there is a concise standard: based on what you think the primary access pattern will be (based on what you think the basic data pattern will be). Therefore, we can often see in blockchain projects that sharding is based on the user's private key/account address, etc., because these values ​​are unique and do not change over time, and the logic of sharding is relatively clear.

In traditional database technology, there are three main ways to shard data:

1. Hash mode, direct modulo:For example, if there are 3 shards, the data will be hashed and modulo 3, and assigned to specific shards according to the result. Irregularity breaks the situation because some key feature values ​​are related to the amount of load, so the data is more likely to be evenly distributed among the various shards. A counter-example is that if the key characteristic value of the data is the order of registration time, the newly registered data is more active, and it is possible to assign them to a certain shard. But the disadvantage of this method is that if new shards are added, it is difficult to rebalance the shards; the advantage is that no additional state information needs to be maintained.

2. Consistent hashing:The consistent hash method without virtual nodes means that the data is mapped to the end-to-end hash ring according to the characteristic value, and the nodes are also mapped according to certain rules. The first node found by the data clockwise is the node where it is stored . Consistent hashing with virtual nodes is similar, except that the virtual nodes are mapped to the hash ring, so an actual physical node can occupy multiple ranges on the hash ring. This method needs to maintain state information, that is, which node the data is allocated to, but the advantage is that it is easier to rebalance the shards if the number of shards needs to increase. However, the maintenance of shard state information needs to consider consistency issues, which is more complicated.

3. Range based:It is divided into different intervals according to the key feature values, and each node corresponds to one or more intervals. Similar to the way of consistent hashing, state information also needs to be maintained.

In the blockchain system, there needs to be a mechanism to know which node implements which shard. In traditional database systems, shard information (that is, metadata, which data is divided into which shard) generally requires dedicated server storage. Sometimes In order to reduce the pressure on the metadata server, in a distributed system, metadata will be cached on other nodes.The idea in the blockchain is also roughly the same. It is necessary to ensure the consistency of metadata cached between nodes, or to introduce a similar master server to ensure performance, but both bring consistency challenges.

Consistency and availability of multiple replicas are the scope of CAP theory discussion, and there are mainly two available solutions.

The first is master-slave synchronization. First, the master server is selected, and only the master server provides external services. The master server saves the update information of the metadata to a shared storage space in the form of a log, and then the slave server reads from the shared storage space. Take the log and apply it to achieve a state consistent with the master server. If the master server is detected to be faulty (such as through heartbeat), a new master server will be re-elected. In the case of network segmentation, it is possible that everyone thinks that the original master server has been down, and a new master server is elected, but in real time, the original master server continues to provide services, which leads to the emergence of "dual master server". " phenomenon, in order to solve this problem, it is necessary to find a way to isolate the old master server so that it cannot provide external services normally. In order to ensure the strong consistency of metadata, when preparing to switch, the new primary server must confirm that the metadata is completely synchronized before continuing to provide external services. In order to achieve this goal, one way is to notify all cache servers immediately when the metadata changes and lock the data. For example, if the task to be completed by the system requires multiple fragments to update the status at the same time, then before the update is completed, access will be rejected. Another way to maintain the high degree of consistency between replicated data that is often achieved in highly scalable NoSQL databases is to use read-write quorum and versioning. This approach avoids locking data at the cost of additional complexity in reading and writing data.

The second way is to achieve the consistency of multiple replicas through a distributed consensus protocol, such as the Paxos and Raft protocols. The protocol can realize that all backups can provide external services and ensure strong consistency.

The state sharding of the blockchain means that each node only stores a part of the blockchain state information, and a similar mechanism is needed to maintain the state information to know which slice stores the required state. The consistency problem that needs to be solved is similar to the above, and the implementation of transaction fragmentation is simpler. In an account-based blockchain system, each transaction will have a sender's address, and then the system can allocate a shard based on the sender's address. This ensures that two double-spend transactions will be verified in the same shard, so the system can easily detect double-spend transactions without any cross-shard communication. If the nodes are deterministic, then there is almost no problem caused by the update of metadata discussed above. However, if the transaction verification involves communication between shards, the overhead cost is usually high, which will affect the throughput and economic benefits of the network.

The network sharding of the blockchain refers to dividing miners into several groups, verifying transactions at the same time, improving the system's ability to process transactions in parallel, and thus improving TPS. Usually, it is possible to decide to select a consensus node by regularly generating random numbers. As long as it is mapped to the already numbered shards, the problem becomes much easier to deal with. However, if a node goes down and the node is reassigned, it is necessary to form a consensus among the fragments. It is worth noting that the network sharding technology is used in the blockchain, that is, the miners are divided into several sub-networks responsible for verifying the transactions on the shards. It is necessary to ensure that the number of malicious nodes is small enough, so the rules for assigning miners Care needs to be taken to ensure randomness.

references:

    

references:

1. "Learning Data Fragmentation of Distributed Systems with Questions",https://www.cnblogs.com/xybaby/p/7076731.html

2. "Sharding Technology (Sharding) - A Good Solution to the Problem of Blockchain Expansion",http://www.8btc.com/sharding-blockchain-scalability

3. 《sharding》,https://docs.mongodb.com/manual/sharding/

4. 《Sharding pattern》,https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding

5. 《Database sharding explained in plain English》,https://www.citusdata.com/blog/2018/01/10/sharding-in-plain-english/

6. Lu Xiaoming, "How far is the sharding technology regarded as the future of the public chain from us?" ",https://www.odaily.com/post/5132394

7. Coin Academy, "Sharding Overview, Zilliqa and QuarkChain",http://8btc.com/article-4660-1.html


哈希未来
作者文库