Vitalik: How to build a secure centralized exchange platform?
ECN以太坊中国
2022-12-29 10:00
本文约7472字,阅读全文需要约30分钟
Explore more possibilities between CEX and DEX at both ends of the asset custody spectrum.

Original title: "Having a safe CEX: proof of solvency and beyond

Original title: "

Original Author: Vitalik Buterin

Compilation of the original text: Double Spending (@doublespending)

Special thanks to Balaji Srinivasan and the teams at Coinbase, Kraken and Binance for their discussions.

Whenever a large centralized exchange crashes, a question that is often asked is whether we can use encryption to solve this problem. The exchange can prove that the funds held on its chain are sufficient to repay users by creating cryptographic proofs, rather than relying solely on government licenses, auditors, investigations of corporate governance, and “legal currency” programs such as legal person backtracking.

More ambitiously, an exchange could create a system where depositor funds cannot be withdrawn without their consent. We can try to explore the boundary between the professional CEX that "does not do evil" and the inefficient on-chain DEX that "cannot do evil" but leaks privacy. This post will delve into historical attempts to make CEXs more trustless, the limitations of their technology, and some powerful means of relying on advanced techniques like ZK-SNARKs.

Balance Tables and Merkle Trees: Traditional Proofs of Repayment

The earliest attempts by exchanges to use cryptography to prove they were not deceiving users dates back a long way. In 2011, MtGox, the largest bitcoin exchange at the time, proved they owned the funds by sending a transaction that moved 424,242 BTC to a pre-published address. In 2013, discussions began on how to solve the other side of the problem: proving the total size of user deposits. If you prove that the user's deposit is equal to X (proof of liabilities), and prove that you have private keys for X tokens (proof of assets), then you provide a proof of solvency: you prove The exchange has enough funds to repay depositors.

The easiest way to provide proof of deposit is to publish a listing. Every user can check their balance in the list, and anyone can check the complete list: (i) each balance is non-negative; (ii) the total is the declared amount.Of course, this breaks privacy, so we can change the scheme a bit: publish a

list, and send the salt value privately to the user. But even this leaks the balance and its distribution. In order to protect privacy, we adopted a follow-up technology: Merkle tree technology.

Green: Charlie's node. Blue: Charlie received a node for attestation. Yellow: root node, announced to everyone

The Merkle tree technology will put the user balance table into the Merkle sum tree. In a Merkle sum tree, each node is a pair. The underlying leaf nodes represent individual user balances and salted hashes of usernames. In each higher-level node, the balance is the sum of the balances of the two nodes below, and the hash is the hash of the two nodes below. The Merkle sum proof, like the Merkle proof, is a "branch" consisting of all sister nodes on the path from the leaf node to the root node.

# The function for computing a parent node given two child nodes

def combine_tree_nodes(L, R):

L_hash, L_balance = L

R_hash, R_balance = R

assert L_balance >= 0 and R_balance >= 0 

new_node_hash = hash(

L_hash + L_balance.to_bytes( 32, 'big') +

R_hash + R_balance.to_bytes( 32, 'big')

)

return (new_node_hash, L_balance + R_balance)

# Builds a full Merkle tree. Stored in flattened form where

# node i is the parent of nodes 2 i and 2 i+ 1 

def build_merkle_sum_tree(user_table: "List[(username, salt, balance)]"):

tree_size = get_next_power_of_ 2(len(user_table))

tree = (

[None] * tree_size +

[userdata_to_leaf(*user) for user in user_table] +

[EMPTY_LEAF for _ in range(tree_size - len(user_table))]

)

for i in range(tree_size - 1, 0, -1):

tree[i] = combine_tree_nodes(tree[i* 2 ], tree[i* 2+ 1 ])

return tree

# Root of a tree is stored at index 1 in the flattened form

def get_root(tree):

return tree[ 1 ]

# Gets a proof for a node at a particular index

def get_proof(tree, index):

branch_length = log 2(len(tree)) - 1 

# ^ = bitwise xor, x ^ 1 = sister node of x

index_in_tree = index + len(tree) // 2 

return [tree[(index_in_tree // 2**i) ^ 1 ] for i in range(branch_length)]

# Verifies a proof (duh)

def verify_proof(username, salt, balance, index, user_table_size, root, proof):

leaf = userdata_to_leaf(username, salt, balance)

branch_length = log 2(get_next_power_of_ 2(user_table_size)) - 1 

for i in range(branch_length):

if index & ( 2**i):

leaf = combine_tree_nodes(proof[i], leaf)

else:

leaf = combine_tree_nodes(leaf, proof[i])

return leaf == root

First, the exchange sends each user a Merkle sum proof of their balance. The user can then be sure that his balance was correctly included as part of the total. Simple sample code can be found here.

The privacy leakage under this design is much lower than the disclosure of the complete balance sheet, and each branch can be disrupted every time Merkle root is released to further reduce the risk of privacy leakage, but there are still some privacy leakage problems: Charlie knows that a certain One person has a balance of 164 ETH, the sum of two user balances is 70 ETH, etc. An attacker who controls multiple accounts could still learn a great deal about an exchange's users.

An important subtlety of this scheme is the possibility of negative balances: if an exchange with a user balance of 1390 ETH but only 890 ETH in reserves tries to make up by adding a balance of -500 ETH under a fake account somewhere in the tree The difference, what should I do? This possibility doesn't actually break the scheme, which is why we purposely use Merkle sum trees instead of regular Merkle trees. Suppose Henry is a fake account controlled by the exchange, and the exchange puts -500 ETH on it:

Greta's verification will not pass: when the exchange will have to give her Henry's node with a balance of -500 ETH, she will reject the invalid node. Eve and Fred will also fail the verification, because the intermediate node above Henry has a balance of -230 ETH, so this node is also invalid! In order for the misappropriation to go undetected, the exchange can only hope that the right half of the tree is not checked for its proof of balance.

If the exchange can single out users with 500 ETH who don't bother to check the proof of balance, or who don't believe them when they complain about not receiving the proof of balance, then the exchange can get away with it. However, exchanges can also achieve the same effect by excluding these users from the Merkle sum tree. Therefore, if only in terms of proof of liabilities, the Merkle tree technology basically meets the needs. But its privacy features are still not ideal. You can use the Merkle tree more subtly to improve, such as using satoshi or wei as an independent leaf node. However, by using more advanced technology, it can be done even better.

Using ZK-SNARKs to improve privacy and robustness

ZK-SNARKs are a powerful technology. What ZK-SNARKs do to cryptography is analogous to artificial intelligence: a general-purpose technology that can overwhelm a range of specialized techniques developed decades ago to solve a range of problems. So of course we can use ZK-SNARKs to greatly simplify and improve privacy in proof-of-debt protocols.

We can simply put all users' deposits into a Merkle tree (or a simpler KZG commitment) and use ZK-SNARKs to prove that all balances in the tree are non-negative and add up to some claimed value. If we add a layer of hashing for privacy, the Merkle fork (or KZG proof) sent to each user will not reveal any other user's balance.

Using KZG commitment is a way to avoid privacy leakage, because it does not need to provide "sister nodes" as proof, and can use simple ZK-SNARK to prove the sum of balances, and each balance is non-negative.

We can prove the sum of the balances in the above KZG and its non-negativity through a dedicated ZK-SNARK. Here is a simple example. We introduce an auxiliary polynomial I(x) that "constructs every bit of the user's balance" (for the sake of the example, let's assume the balance is below 2 15 ), where every 16th position tracks the difference guaranteed that only when the actual total is equal to Claims that the total equals the value will be 0. If z is a primitive root of order 128, we can prove that the equation holds:

[ 1 ] Translator's Note: Interpretation of this polynomial equation.

How to convert these equations into polynomial corrections and then convert them into ZK-SNARKs can refer to my article on ZK-SNARKs here and another place. This is not an optimal protocol, but makes these cryptographic proofs understandable!

With only a few additional equations, the constraint system can be adapted to more complex settings. For example, in a leveraged trading system, it is acceptable for individual users to have negative balances, but only if they have enough collateral assets to cover their liabilities. SNARKs can be used to prove this more complex constraint, assuring users that an exchange cannot secretly exempt certain users from the rules, thereby endangering user assets.In the long run, the usefulness of this ZK proof of liability is not limited to user deposits in exchanges, but can also be used in a wider range of lending scenarios. Anyone who takes out a loan puts the record into a polynomial or a tree containing that loan, and the root is published on-chain. This would allow anyone seeking a loan to provide a zero-knowledge proof to the lender that it has not taken out too many other loans. Eventually, legal innovations could even allow loans committed in this way to have higher priority than uncommitted loans. this is with us in"Decentralized Society: Finding the Soul of Web3"

An idea that coincides with that discussed in : the concept of on-chain negative reputation is made possible through some form of "soul-bound tokens".

Proof of assets

The simplest version of Proof of Assets is the protocol we saw above: to prove that you hold X tokens, you simply move X tokens at a predetermined time or carry the message "these funds belong to Binance" in the transaction. To avoid paying transaction fees, you can sign an off-chain message. Both Bitcoin and Ethereum have off-chain signature message standards.

  • There are two practical problems with this simple proof-of-asset technique:

  • cold wallet handling

Collateral reuse

For security reasons, most exchanges store the majority of user funds in cold wallets: on computers that are offline, transactions need to be manually signed and carried to the internet. This approach is common: the cold wallet I used to store my private funds on a permanently offline computer generates QR codes containing signed transactions, which I then scan with my phone . Due to the huge amount of funds, the security protocol used by the exchange will be more complex, often involving multi-party calculations between multiple devices, to further reduce the possibility of key leakage caused by a single device being hacked. In this context, even creating an additional message to prove control of an address is an expensive operation!

Exchanges can use the following methods:

● Maintain some long-term public addresses. The exchange generates several addresses, issues proof of ownership of each address only once, and reuses these addresses. This is by far the easiest solution, although it adds some limitations in terms of security and privacy.

● Hold many addresses, and then randomly prove a few addresses. Exchanges hold many addresses, and may even use each address only once and not use it after a single transaction. In this case, the exchange would need to have an agreement that every now and then some addresses would be chosen randomly, which the exchange would have to "open" to prove ownership. Some exchanges already use auditors for similar operations, but in principle this technique could be turned into a fully automated procedure.

● A more complex ZKP approach. For example, an exchange can set up 1/2 multi-signature for all its addresses, one of these addresses has different keys, and the other has the same key in some complex but secure way (eg, 12/ 16 multi-signature) to store an important emergency backup blind version. To preserve privacy and avoid revealing their full addresses, an exchange can even run a zero-knowledge proof on the blockchain to prove the total balance of addresses on-chain in that format.

Another major concern is preventing collateral reuse. Transferring collateral back and forth between each other to prove reserves is often easy for exchanges, allowing one to get away with effectively being insolvent. Ideally, proofs of payability should be done in real-time, with proofs updated after each block. If that's not practical, then the next best thing is to co-ordinate a fixed time between exchanges for attestation, such as attesting reserves every Tuesday at 2pm UTC.

The last question is: Can assets be certified on legal tender? Exchanges not only hold cryptocurrencies, but also fiat currencies within the banking system. In that regard, the answer is yes, but such a program would inevitably rely on a "fiat currency" trust model: banks themselves could certify balances, auditors could certify balance sheets, etc. Given that fiat currencies cannot be cryptographically verified, this is the best solution within this framework and still worth doing.

Another approach is to separate Entity A from Entity B, where A runs the exchange and handles USDC, a stablecoin backed by an asset; The outflow process, in this case B is USDC itself. Since the "liability" of USDC is only the ERC 20 token on the chain, the proof of liability can be obtained "easily", and we only need to deal with the issue of proof of assets.

Plasma and validiums: can we achieve unmanaged CEX?

Suppose we want to go a step further: we don't want to just prove that the exchange has enough funds to pay back its users. Instead, we want to completely prevent transactions from stealing users' funds.

One of the first early adopters here is Plasma, a scaling solution that became popular in the Ethereum research community in 2017 and 2018. Plasma works by splitting the balance into a set of independent "tokens", each assigned an index and placed in a specific position in the Merkle tree of a Plasma block. For a valid token transfer to take place, the transaction needs to be placed in the correct place in the tree, and the root of the tree is published to the chain.

A minimalist diagram of a version of Plasma. Tokens are held in a smart contract that enforces the rules of the Plasma protocol when withdrawing.

OmiseGo tried to create a decentralized exchange based on this protocol, but since then they have moved on to other things - and for that matter, the Plasma Group, with their optimistic rollup project Optimism.

Discussions in 2018 about the limitations of Plasma (eg, proof token defragmentation) cast fundamental doubts on the viability of Plasma. Since the discussion on Plasma reached its peak in 2018, ZK-SNARKs have become more and more feasible for expansion-related use cases. As we said above, ZK-SNARKs have changed everything.

The newer version of Plasma is what Starkware calls validium: basically the same as ZK-rollup, except the data is kept off-chain. This construct is applicable to many use cases, and it could conceivably be applicable to any scenario where a centralized server needs to prove that it executed code correctly. In validium, the operator cannot steal funds, but depending on the specific implementation details, some user funds may be stuck if the operator disappears.

Now it looks great: CEXs and DEXs are far from either, it turns out that there is a range of options, including various forms of hybrid centralization, where you get some benefits, such as efficiency, but there are still many cryptos Academic guarantees can prevent most forms of malicious behavior by centralized operators.

However, it's also worth thinking about the remaining fundamental question: how to handle user errors. By far the most important type of error is: What if a user forgets their password, loses their device, gets hacked, or loses access to their account?

Exchanges can solve this problem by leveraging email recovery first, and if that fails, more complex recovery via KYC. But to solve these problems, exchanges need to actually control these tokens. In order to be able to reasonably recover user funds, exchanges need to have the same powers that can be used to steal user funds without reason. This is an unavoidable tradeoff.

The ideal long-term solution is to rely on self-custody, and users can easily use technologies such as multi-sig and social recovery wallets in the future to help deal with emergencies. In the short term, there are two obvious alternatives, with different costs and benefits:

Another important issue is cross-chain support: exchanges need to support many different chains, systems such as Plasma and validiums need to be coded in different languages ​​to support different platforms, and in their current form cannot be used on some platforms (especially Bitcoin), which is expected to be resolved through technical upgrades and standardization; however, in the short term, this is another reason why custodial exchanges remain in custody today.

Conclusion: Looking to a better exchange in the future

In the short term, there are two clear categories of exchanges: custodial exchanges and non-custodial exchanges. Today, the latter category is a DEX like Uniswap, and in the future we may also see CEXs constrained by cryptography, where user funds will be held in a way similar to validium smart contracts. We may also see semi-custodial exchanges where we trust their handling of fiat currencies rather than cryptocurrencies.

Both types of exchanges are here to stay, and the easiest way to improve backwards compatibility for the security of custodial exchanges is to add proofs of reserve. This includes a combination of proof of assets and proof of liabilities. There are still some challenges to designing good protocols for both, but we can and should push both technologies to go hand in hand, and open source software and programs as much as possible so that all exchanges can benefit.

In the long run, I hope we move towards a direction where all exchanges are non-custodial, at least when it comes to cryptocurrencies. Wallet recovery will exist and may require highly centralized recovery options for new users with small funds and institutions requiring such arrangements for legal reasons, but this can be done at the wallet layer rather than within exchanges. In terms of legal currency, the movement between the traditional banking system and the cryptocurrency ecosystem can be completed through the native fund in and out process of asset-backed stablecoins such as USDC. However, we still have a long way to go.

[1] Translator's Note:

➤ Every 16 digits represents a user (the first 15 digits represent the user's balance, and the last digit represents the difference between the sum of the user's balance and the statement so far). We can see that the above example represents two users (readers need insight here)

➤ Claimed Average User Balance: 185

➤ User 1's balance: 20 -> 000 0000 0001 0100

Difference: 20 - 185 = -165

➤ User 2's balance: 50 -> 000 0101 0011 0010

Difference: -165 + 50 -185 = -300

➤ After traversing all users, the balance requirement of the last user is 0

➤ Explanation of the four equations

Equation 1: The initial value of the recursion is 0

Equation 2: Each user balance needs to correspond to KZG commitmentEquation 3: The recursive formula for each user's balance, the constraint balance is >= 0 and< 2 14 ,

< 2 14 (It is said that the balance < 2 15 should be a clerical error, because according to Equation 3, the recursive formula has only 14 values, I(zi)

Original link

Original link

ECN以太坊中国
作者文库