Vitalik responded to Musk: Improving blockchain scalability is not easy
链捕手
2021-05-24 10:53
本文约5390字,阅读全文需要约22分钟
It turns out that, sharding or not, there are important and rather subtle technical factors that limit the scalability of blockchains. In many cases, these limitations have solutions, but even with solutions, there are limitations.

This article comes from: Chain Catcher, Author: Vitalik Buterin, Compiler: Alyson, forwarded with authorization

How far can you push blockchain scalability? As Musk hopes, can you really achieve "reduce block confirmation time by 10 times, increase block size by 10 times and reduce transaction fees by 100 times" without causing extreme centralization and harming users? Basic properties of blockchain? If not, how far can you go? What if you change the consensus algorithm? More importantly, what happens if you change the technology to introduce features like ZK-SNARKs or sharding?

first level title

01 Nodes need to be sufficiently dispersed

At 2:35 in the morning, you get an urgent call from a partner on the other side of the world who is helping you manage the mining pool (or possibly the staking pool). Starting about 14 minutes ago, your partner told you that your mining pool and a few others forked off from the blockchain that still hosts 79% of the network. According to your node, the blocks of the majority chain are invalid. Here comes the balance error: the key block appears to have mistakenly allocated 4.5 million extra coins to an unknown address.

An hour later, you are in a Telegram chat with two other small mining pools. You end up seeing someone paste a link into a tweet with a posted message. The tweet began with “Announcement of new on-chain sustainable protocol development fund.”

By morning, arguments were everywhere on Twitter and community forums. But by then, a significant portion of those 4.5 million tokens had been converted on-chain into other assets, and billions of dollars in DeFi transactions had taken place. 79% of consensus nodes, as well as all major blockchain explorers and light wallet nodes follow this new chain. Maybe a new developer fund will fund some developments, or maybe all of them get swallowed up by leading exchanges. But regardless of the outcome, the fund is, for all intents and purposes, a fait accompli, and ordinary users are powerless to fight back.

Can this happen on your blockchain? The elite of your blockchain community is probably well coordinated, including mining pools, block explorers, and custodian nodes. They are likely all on the same Telegram channel and WeChat group. If they really wanted to make sudden changes to protocol rules to further their own interests, they probably would. The only surefire way to nullify this coordinated social attack is through passive defense, and that group is effectively dispersed: the users.

Imagine how the story would play out if users were running nodes that validated the blockchain and then automatically rejected blocks that broke protocol rules (even if more than 90% of the miners or stakeholders supported it). If every user runs a validating node, the attack will fail quickly: some mining pools and exchanges will fork in the process, which seems rather stupid.

image description

Paradigm research partner Hasu's tweet

first level title

02 Where are the limitations of node work?

To maximize the number of users who can run a node, we are focusing on regular consumer hardware. There are three key limitations to a full node's ability to process a large number of transactions:

  • Bandwidth: Given the realities of current internet connections, how many bytes can a block contain?

  • Bandwidth: Given the realities of current internet connections, how many bytes can a block contain?

  • Storage: How much GB disk can we ask the user to store? Also, how fast does it have to be to be read? (i.e. can a HDD be used, or do we need a SSD?)

secondary title

1) Computing power

Wrong answer: 100% of CPU power can be spent on block validation.

Correct Answer: About 5-10% of CPU power is available for block validation.

There are four main reasons why the limit ratio is so low:

  • We need a safety margin to cover the possibility of a DoS attack (transactions made by attackers to exploit code weaknesses take longer to process than regular transactions);

  • Nodes need to be able to sync the blockchain after they go offline. If I disconnect from the network for a minute, I should be able to catch up in seconds;

  • Running a node shouldn't drain the battery very quickly, slowing down all other applications;

  • Nodes also need to perform other non-block producing tasks, mainly around validating and responding to incoming transactions and requests on the p2p network.

Note that, until recently, most explanations of "Why only 5-10%?" focused on a different problem: Since PoW blocks appear randomly, the longer time it takes to validate a block increases while Risk of creating multiple blocks.

There are many solutions to this problem (e.g. Bitcoin NG or just using Proof of Stake). But those fixes don't fix the other four, so they don't deliver the huge scalability gains many initially thought they would.

secondary title

2) Bandwidth

WRONG ANSWER: If we have 10 MB chunks every 2-3 seconds, then most users have >10 MB/sec network speeds, so of course they can handle it.

Correct Answer: Maybe we can process 1-5 MB chunks every 12 seconds, though hard.

We often hear advertising statistics these days about how much bandwidth an internet connection can provide: figures of 100 Mbps or even 1 Gbps are often heard. However, there is a large discrepancy between advertised bandwidth figures and actual bandwidth for several reasons:

  • "Mbps" means "millions of bits per second", a bit is 1/8 of a byte, so you need to divide the number of advertised bits by 8 to get the number of advertised bytes;

  • Like all companies, internet providers often lie;

  • There are always multiple applications using the same internet connection, so nodes cannot hog the entire bandwidth;

  • A p2p network inevitably brings its own overhead: nodes will often download and reupload the same block multiple times (not to mention transactions broadcast through the mempool before being included in the block).

When Starkware experimented in 2019, they released 500 kb blocks for the first time, as the reduction in transaction gas costs made this possible for the first time, several nodes were actually unable to process blocks of that size.

secondary title

3) storage

Incorrect answer: 10TB.

Correct answer: 512G.

As you can probably guess, the main argument here is the same as elsewhere: the distinction between theory and practice. In theory, you could buy an 8 TB SSD on Amazon. The laptop used to write this blog post actually has 512 GB, and if you made people buy their own hardware, many of them would be lazy (or they couldn't afford an $800 8TB SSD), Instead, use a centralized provider.

Also, even if you could get blocknodes up and running onto some storage disks, high levels of activity could easily burn up the disks quickly, forcing you to keep buying new ones.

first level title

03 Risks of Fragmented Blockchain

Today, running a node on the Ethereum blockchain has become a challenge for many users. So we hit a bottleneck. The biggest concern of core developers is storage size. So, for now, efforts to address computational and data bottlenecks, or even changes to the consensus algorithm, are unlikely to result in significant gas limit increases. Even solving Ethereum’s largest prominent DoS vulnerability would only increase the gas limit by 20%.

The only solution to the storage size problem is statelessness and state expiration. Statelessness allows a class of nodes to validate the blockchain without maintaining permanent storage. Status expiry clears the not-recently-accessed status, forcing the user to manually provide proof of renewal.

Both paths have been used for a long time, and proof-of-concept implementations of statelessness have already begun. Combined, these two improvements can greatly alleviate these concerns and open up room for a substantial increase in the gas limit. However, even after implementing statelessness and state expiration, the gas limit may only be safely increased by a factor of about 3 until other limits start to dominate.

Sharding fundamentally bypasses the aforementioned limitations, as it decouples the data contained on the blockchain from what individual nodes need to process and store. They use advanced mathematical and cryptographic techniques to verify blocks indirectly, rather than nodes verifying blocks by downloading and executing them themselves.

Therefore, sharded blockchains can safely have a level of transaction throughput that non-sharded blockchains cannot. It does take a lot of cryptographic ingenuity to create efficient and simple full verification that successfully rejects invalid blocks, but it can be done: the theory is well established, and proof-of-concepts based on draft specifications are already underway.

Ethereum is planning to use quadratic sharding, so total scalability is limited because nodes must be able to handle individual shards and the beacon chain (must perform a certain amount of administrative work for each shard). If the shards are too large, nodes can no longer process a single shard, and if there are too many shards, nodes can no longer process the beacon chain. The product of these two constraints forms the upper bound.

It is conceivable to go further by doing cubic sharding or even exponential sharding. In such a design, data availability sampling would certainly become much more complicated, but it can be done. However, Ethereum will not go further than the quadratic curve. The reason is that transaction sharding can't really achieve additional scalability gains unless the other stakes become very high.

So what are these risks?

1) Minimum number of users

It is conceivable that a non-sharded blockchain can function as long as there is a single user willing to participate. This is not the case with sharded blockchains: no single node can process the entire blockchain alone, so enough nodes are needed to process it together. If each node can handle 50 TPS, and the blockchain needs to handle 10,000 TPS, then at least 200 nodes are needed on the chain to run.

If the blockchain has less than 200 nodes at any time, then either the nodes cannot keep up with the blockchain, the nodes cannot detect invalid blocks, or any number of other bad things can happen, depending on the node How the software is installed.

If the capacity of a sharded blockchain increases by 10 times, the minimum number of nodes will also increase by 10 times. So, you might ask: why don't we start with a little capacity, increase capacity when we see a large influx of users, and reduce capacity if the number of users decreases? So we can get hold of the parts that actually need it.

Here are some questions:

  • The blockchain itself cannot detect exactly how many unique nodes there are, so this will require some kind of governance to detect and set the number of shards. Exceeding capacity limits can easily be a source of fragmentation and conflict.

  • What if many users suddenly and unexpectedly log out?

  • Increasing the minimum number of nodes required for a fork to launch would make it harder to defend against hostile takeovers.

The minimum number of nodes is almost certainly no more than 1000. Therefore, it seems difficult to justify a blockchain with more than a few hundred shards.

2) Historical Retrievability

An important property of blockchain that users really value is permanence. When the company goes bankrupt or loses the ability to maintain the ecosystem, the digital assets stored on the server will be cleared after 10 years. NFTs on Ethereum, on the other hand, are permanent.

Yes, people will still be downloading and retrieving your CryptoKitties in 2371.


But once the blockchain capacity gets too high, it becomes more difficult to store all this data. If at some point a great risk is encountered, there are parts of the history that no one will store.

Quantifying this risk is easy. Taking the data capacity of the blockchain (MB/s) as the unit, multiply by 30 to get the amount of data stored in TB per year. The current sharding plan has a data capacity of about 1.3 MB/sec, so about 40 TB/year. If you increase that by a factor of 10, it becomes 400 TB/year.

If we want the data not only to be accessible, but to be easily there, we also need metadata (e.g. unpacking rollup things), so 4 PB per year, or 40 PB after 10 years. This is a reasonable upper limit that most sharded blockchains can safely achieve.

04 Summary

04 Summary

There are two approaches to attempting to scale blockchains: basic technical improvements and simply adding parameters. First, increasing the parameters sounds appealing: if you do the math on a napkin, it's easy to convince yourself that your home laptop can handle thousands of transactions per second without the need for ZK-SNARKs, rollups, or sharding. Unfortunately, this approach is fundamentally flawed, for many subtle reasons.

Computers running blockchain nodes cannot expend 100% of their CPU capacity validating the blockchain; they need a large margin of safety against accidental DoS attacks, and they need spare capacity to perform transactions such as processing in mempools. And users don't want to run node on a computer so that the computer can't be used for any other application at the same time.

Bandwidth also has an overhead: a 10 MB/s connection doesn't mean you can have 10 MB chunks per second, you could have up to 1-5 MB chunks every 12 seconds, same with storage. Increasing the hardware configuration for running nodes and restricting the operation of nodes to specific actors is not a solution. For a decentralized blockchain, it is critical that ordinary users be able to run nodes and that there is a common culture of activity for running nodes.

Fundamental technical improvements are certainly at work. Currently, the main bottleneck of Ethereum is storage capacity, statelessness and state expiration can solve this problem and allow to increase it by up to about 3 times (but not more than 300 times), because we want to run a node than it is now easier. A sharded blockchain can scale further because there are no transactions in a sharded blockchain that a single node needs to process.

But even then, there is a limit to capacity: as capacity increases, so does the minimum number of nodes, and so does the cost of the archive chain (and the risk of data loss rising if no one bothers to manage the archive chain).

But we don't have to worry too much: these limits are high enough that we can process over a million transactions per second with the blockchain fully secure. But it will take work to do this without sacrificing the decentralization of the blockchain.

链捕手
作者文库