
Editor | Carol
Editor | Carol
A question is often asked: "How to realize the read permission of the data on the chain in the contract?"
Behind such a demand is that the developer wants to put some data on the chain and let the smart contract manage and calculate it to reach a business consensus, but he does not want the data to be publicly visible to avoid other unauthorized participants on the chain from reading it, resulting in information Give way.
The most intuitive implementation idea is to write a piece of filtering logic in the contract code, judging that the caller meets certain conditions (such as being in the whitelist) before allowing the data to be returned, otherwise rejecting it.
We set up a case: there is a points alliance chain, and the participants on the chain include Alice, Bob, Carl, Dave, etc. and their family members. We hope that the point balance of each person can only be set to be visible to himself and his family, and not to other participants. .
The client sends a request to a certain node through the application-level interface of the blockchain, calls the get method of the smart contract to check Bob's points, and the smart contract writes permission control logic to deny unauthorized access.
Because the running logic of the smart contract on each node is consistent, no matter which node the request is sent to, the result is the same. This may seem like a good idea, but is it actually the case?
Here is the conclusion first: this is a "palliative, not a permanent" approach, and it does not guarantee that data will not be leaked.
From now on, we will re-examine this case with the thinking of "multi-center and trustless".
Let's analyze first: how is the data stored on the chain? Under what circumstances will it be leaked?
The blockchain network nodes are distributed in the environment of different participants. Due to the data consistency characteristics of the blockchain, each node holds a complete copy of the data. Regardless of whether the database is a file-type database such as LevelDB/RocksDB or a relational database such as Mysql, the data will fall into the database instance of each node.
That is to say, Bob’s point balance is saved on all nodes’ hard disks, and viewed in the MySQL database tool, it looks like this:
If there is a participant with some experience in blockchain technology on the chain (with a small probability) who secretly harbors "maliciousness" (also known as a Byzantine player), he can use tools to open the local database and directly query Bob's balance. In this way, the control logic that uses contracts to prevent data leakage will be completely bypassed. It's that simple.
In addition, blockchain data is not only related to contracts, but also closely related to transaction records.
When sending a transaction, the transaction parameters will contain part or all of the data (for example, Alice transfers 100 to Bob), and the transaction will be packaged into a block and finally written into the node database.
Queries on block and transaction data are generally not implemented with contract logic, so simply writing filtering logic in the contract cannot prevent the reading of these data. Byzantine players can traverse the block data in the local database, obtain the transaction history details, replay the transaction flow from beginning to end, and know that Bob's current balance is 300.
From the perspective of the entire technology stack, it is trivial for a Byzantine player to use tools to access local data, traverse blocks, and trade. He can even modify the code of the blockchain system, from the aspects of blockchain network interface, program memory, and smart contract engine. Cut in, sniff and intercept plaintext data from protocol packages, blocks, transaction flows, contract context, status data, etc. Even if the data is encrypted and the key is in the hands of the node holder, he can still unlock it. .
Therefore, starting from the underlying code of the blockchain to control the permission to read data is also useless. After all, anyone can change the open source code. The "bad guys" in China are omnipotent and hard to guard against.
In conclusion, blockchain emphasizesandand"consistency", as long as the plaintext data is broadcast on the chain, there are countless ways for others to obtain it. Whether it is at the contract layer or the underlying code, almost all read control logic is likewindow paperpoke and break, likeMaginot LineThe same is useless.
Seeing this, someone may ask: If reading data is so undefended, is the "write" permission on the blockchain still meaningful? The answer is: yes.
Going back to the example of points, we set Alice as the point administrator, so that she can initiate a transaction to transfer points, and then Bob only accepts points from Alice. The transaction of transferring points needs to go through the consensus of the whole network. All consensus nodes will check the rules written in the contract, and refuse to sign if they do not meet the requirements. If the transaction beyond the authority cannot be agreed, the data will not be modified.
At this time, even if there are a small number of Byzantine nodes, no matter how hard the local nodes are, they cannot tamper with the data of the entire network.
"Write" transactions in pursuit of consensus, so when the client sends a transaction (sendTransaction or sendRawTransaction), it must be digitally signed, and the blockchain system verifies the signature to confirm which external account sent the transaction, which can be strictly verified and accurately traced.
"Read" operations place more emphasis on sharing, the operation of reading data does not actually go through the consensus process, just flip through the data on your own node. Usually, the blockchain system does not need to strictly fill in the sender in the read interface (call), nor does it need to be digitally signed. Therefore, it is invalid to judge the external account in the contract reading method.
Based on the above analysis, it can be concluded that it is not a simple matter to implement read control on the chain.
If insufficient consideration is given to the read control logic, the effect will be: you read the data on your own node to test and verify, and the appearance looks OK. You think the years are quiet, but you don’t know that the data has been flipped by a Byzantine player The bottom is up.
Considering the de-trust in multi-party collaboration, and the pursuit of data sharing, openness, and transparency, generally speaking, if it is critical and sensitive data that cannot be leaked, it must be carefully uploaded to the chain. The "Greatest Common Divisor" that can be shared.
In fact, the status of transactions and balances in many blockchain systems are visible to the entire network. The so-called anonymity or privacy only uses public and private keys and address systems to replace plaintext accounts. It is not suitable for fields such as finance and government affairs where the model is complex and comprehensive privacy is emphasized.
So what other methods do we have to properly control data visibility while taking into account sharing, transparency, and openness?
The first idea isCombined with off-chain governance, to agree on the boundaries of responsibility and rights. I have done a good job in the design and implementation of permissions at the contract and interface levels to ensure that no data is leaked in my business system, and my blockchain application layer, display interface, reports, logs, databases and other links will not be accessed by unauthorized persons, eliminating I operate risk internally.
As for other people's nodes, I don't care, that is their responsibility, whoever leaks and misuses data will be severely punished (it is actually quite difficult to obtain evidence). This kind of logic actually means "sweeping the snow before everyone". In this mode, my sensitive data still cannot be uploaded to others.
The second idea isIntroduce cryptography. Here are a few examples.
Asymmetric encryption:The data on the chain is encrypted with the recipient's public key, and only the recipient can unlock it with its own private key.
Password envelope:The uplink data is encrypted with a certain password, and the password is given to the receiver through an off-chain channel, and only the receiver who knows the password can decrypt it.
Property encryption:The data is encrypted using the attribute encryption algorithm, and only those who meet the specified attributes (such as administrator attributes) can be decrypted. The consideration of these solutions is that the overhead of calculation, transmission, and storage will be higher. In addition, the encrypted data does not support plaintext calculation, which makes it difficult to implement complex business contract logic. It should also be noted that even if it is encrypted, essentially all the information of the data is still on the chain. With the passage of time, the evolution of computing power and algorithms (such as quantum cryptography), there is the possibility of being brute-forced, or because If the key is leaked/to be guessed too easily, and the data on the chain cannot be withdrawn, there is a risk of being announced to the world.
The third idea isOnly the summary is uploaded to the chain, the data plaintext is not on the chain at all.
In fact, the role of the blockchain is not necessarily to fully grasp the data and execute complex business rules, but to rely on the credibility of multiple witnesses to verify the accuracy and integrity of the data, and to play the role of evidence and traceability. At this stage, many blockchain systems are mainly based on such logic, which can objectively serve as an anchor point of trust.
If you need plaintext data, then use the addressing information in the abstract to obtain the data from the off-chain system, do fine-grained authority control in this link, and conduct mutual verification with the on-chain abstract.
However, it is still a bit reconciled that the data is not on the chain. How can such an innovative concept of the blockchain and such a powerful function of the smart contract be fully utilized?
Privacy ComputingPrivacy ComputingA series of heavy weapons, including but not limited to zero-knowledge proof, homomorphic encryption, secure multi-party computing, and federated learning, can perform addition, subtraction, multiplication, division, logical operations, sorting, and Statistical analysis can further achieve the effect of "anonymous front desk and auditable background", so as to meet regulatory compliance requirements. This is the ultimate meaning of "available invisible" on the blockchain.
Due to space limitations, the details of privacy calculations are not discussed here. You can refer to the open source scenario solutions related to WeDPR privacy protection, especially several scenarios, such as the VCL blockchain verifiable ciphertext ledger, which can be used to solve the points mentioned above. Some privacy issues in the case.
WeDPR privacy protection related open source scenario solutions:
https://fintech.webank.com/wedpr/VCL
epiloguehttps://sandbox.webank.com/wedpr/confidentialpayment/#/start
epilogue
Originally, I just wanted to talk about such a small issue as "how to write contract read permissions", but it turned into a long article.
In fact, when facing blockchain programming and development, you really can’t think about problems like writing stand-alone or cluster software, but fully consider the collaborative relationship in a multi-party participation and trustless environment, based on the basic philosophy of sharing, transparency, and traceability. First, pay attention to privacy protection appeals, weigh the importance and sensitivity of data, and then go deep into the technology stack, consider the efficacy and cost of various algorithms, and combine current and future risks and benefits to choose an appropriate strategy, so as to fully protect data and Privacy, develop business safely, and safeguard the rights and interests of itself and users.