
Editor's note: Blockchain involves many technologies, from the bottom layer of the Internet to the unknown cryptography, but there are often many people who pay attention to currency prices and few people who study technology. During the bull market, everyone will study hard to speculate in coins. During the bear market, there is nothing to do anyway, so I think we can study harder. As a liberal arts student, I certainly have a lot of questions that science students seem to think are idiotic.As a reporter, it is not difficult for me to find people who understand the industry to explain to me in human terms, and they often do not dislike me face to face.
I wrote down my study notes to share with everyone.This time I am studying a virtual machine, and I am also planning to learn cross-chain, VRF, open source history and culture, network architecture and blockchain layered system comparison, "If the currency circle is regarded as a country, the currency of this country is experiencing What". If you have any other interesting questions,image description
This note is really longer than I expected
Smart contracts are a major direction for the current public chain to be perfected.
There have been many loopholes in Ethereum’s smart contracts. The industry generally believes that the smart contracts are too flexible, that is, Ethereum wants to make smart contracts Turing complete. Previously inI interviewWhen ArcBlock CEO Mao Zhihong, he concluded that the Ethereum smart contract loopholes, first, because they are too flexible; second, because the virtual machine and language are new and not mature. "The problem of overflow issued some time ago (the problem of Meitu currency is caused by integer overflow), should have been solved at the language level."
Based on these two problems in Ethereum, projects dedicated to "making a better Ethereum" or "vertical version of Ethereum" will improve on the original basis, or invent their own smart contract language, as I have reported there areZilliqa, a public chain featuring sharding, and the e-commerce public chain CyberMiles.
CyberMiles' Virtual Machine (CVM) and high-level programming language Lity were recently released, and I took the opportunity to ask their Chief Scientist Michael Yuan about the basics of virtual machines and organize our conversation.
In my memory, I first came into contact with virtual machines as a network administrator in the school network center (10,000 words about the tragic story about liberal arts students being crushed in the network center are omitted here), because the authority is relatively high, there will be a virtual machine in the school data center. The account on the machine; later, when I was an intern at Caixin, a colleague installed a Win virtual machine to install Wonder on the Mac system;Then when I reported on the cooperation between Alibaba Cloud and Docker, Since both virtual machines and Docker are virtualization technologies, I learned about the difference between the two at that time; finally, I heard Java virtual machine and Ethereum virtual machine (EVM).
secondary title
What is a virtual machine?
As the name suggests, a virtual machine is relative to a physical machine. For example, if you are using the same computer, I will make you feel that you have a computer; if you are using many computers, you will feel that you are using one computer. Of course, it is easy to find such a thing as Du Niang, but it is often half-understood——Virtual refers to a complete computer system that has complete hardware system functions and runs in a complete environment through software simulation.. But the virtual machine on top of the operating system does not seem to be the case.
Michael's explanations in human terms deepened my understanding of virtual machines. He told me that the emergence of virtual machines was earlier than operating systems such as Windows and the Internet, and it was originally to solve the problem of time-sharing leases of computers.
Xiao Ming:What exactly is the role of a virtual machine? When did it first appear?
Michael:"In the beginning, the large computer was shared by many users, that is, you use it for ten seconds, and I use it for ten seconds. You think this machine has always been yours, and I think it has always been mine, but I have been waiting for it all the time. It takes ten seconds. When the computer switches from you to mine, you must restore all the things you just calculated and show them back to you. Then, after ten seconds, remove yours and expose all my things. Show me."
"We use the same hardware, but there are different virtual machines on it. Everyone thinks this hardware is their own, but it is actually realized by software, which is the practice of implementing virtual machines on mainframes. Later, personal virtual machines The role is small, so there is no need to use it. Until the emergence of Java in 1997, the virtual machine became particularly important. Because at that time there were various operating systems such as Windows and Unix. Java was the first to say that I use The virtual machine unifies all these, so you don’t have to write one for Windows and one for Unix, you can write it in Java, and I have a virtual machine on it.”
"The virtual machine is an abstraction. My virtual machine has its own language, and then you can write things in the virtual machine, but there can be different hardware and software under the virtual machine, and even the same hardware. How can I divide it into different languages? The time slice. Why is the virtual machine called a virtual machine? It abstracts away the underlying things, and you don’t need to care about what is underneath you.”
The earliest virtual machine was invented by IBM around 1965: a single computer system can simulate multiple computers with different operating systems through the IBM model. This allows users to create and manage multiple different copies of virtual machines on a single computer. A key feature of the IBM model is enabling programs to share the same hardware by partitioning computer resources and completely isolating programs. (copy from "Virtual machine summary》)
By the 1990s, the Java programming language became popular, which opened the highlight of virtual machine applications. Because the network spanned different operating systems, different browsers, and different devices at that time, how difficult it was to run the same application program, Java just met this demand (the main feature of the Java virtual machine is that the program can be written once It can be executed on multiple platforms relying on the JAVA virtual machine), so it became popular in line with the needs of the times. The slogan at that time was "compile in one place and run everywhere". (copy from "VM history》)
By 2005, Google acquired the Dndroid system, and released the first open source version in 2007. In this system, there is a virtual machine called dalvik, which is somewhat similar to the Java virtual machine, but it is different, because it It is mainly designed to run on mobile phones and tablets, machines with limited memory and low CPU frequency. Currently, countless mobile phone applications are running in this virtual machine. Regardless of the Android system's siege, it can be said that the first 30 years of the IT industry can be summarized by Intel Inside, and the next 30 years can be described by Android Inside. This virtual machine was originally designed by SUN to run on embedded devices. It is not until today's dalvik virtual machine that it can truly realize its original brilliant application and achieve the opportunity of a unified platform for all devices. (copy from "secondary title》)
Why does the blockchain need a virtual machine?
After understanding the role of "unification" of the virtual machine,original cognitionTell me, the performance of the virtual machine is definitely not as good as you call the physical machine directly, there will be some loss in it, and it stands to reason that the more layers, the greater the loss. Why does the blockchain need to add a layer of virtual machines to itself, and what does it need to unify?
To put it simply: the blockchain has a consensus mechanism, so everyone is required to have the same calculation results, but traditional virtual machines do not support this, and the latter may output different results due to different underlying hardware, so V God wants to Write another one.In the previous interview with Lao Mao, he also said, "The reason why Ethereum chose to write a new language Solidity is to realize the smart contract code.consistency。”
Xiao Ming:How is the virtual machine on the blockchain different from our traditional contact? Why do you need a virtual machine?
Michael :The virtual machine of the Mainframe mainframe is under the operating system, and the Java virtual machine is above the operating system, so they are different virtual machines. What exactly is the virtual machine of the blockchain unified? The blockchain virtual machine unifies different nodes, and each node is different.
The underlying language is constructed with C++, for example, and the virtual machine is either Vyper or Solidity, or the virtual machine has its own language. The first generation of blockchain does not have a virtual machine, and Bitcoin does not have a virtual machine. Bitcoin is a transaction, sending money from A to B, but the genius idea of Ethereum is, why do I have to send money from A to B in transactions? Why can't I make the process from A to B a piece of code? But to execute this piece of code, there needs to be a level above which executes the code, and this level is the virtual machine.
At this time, the role of the blockchain has changed from making no mistakes in sending money to not making mistakes in executing codes, which requires all nodes to execute the same code to get the same result.
All nodes have the same result on this executed code, which is very demanding. Because in traditional virtual machines, such as Java virtual machines, this cannot be achieved. Because some machines are 64-bit, some machines are 32-bit, 10 to the negative tenth power, the representation on the A machine is different from that on the B machine. If this is calculated, it is not easy for each node to reach a consensus, so he must Write your own virtual machine. Therefore, Vitalik wrote the first version of the ethereum virtual machine.
This virtual machine has to be protected against attacks, because it cannot be said that any code can be executed in the virtual machine, otherwise someone will write code that executes in an infinite loop. Therefore, to make gas in the virtual machine, you need to know how much the storage capacity of this thing is (the code of the smart contract is stored on the blockchain, and the longer the code, the higher the gas fee). Based on this, I will help you collect money through transaction fees. To ensure the security on the virtual machine. Therefore, the requirements of the blockchain virtual machine and the general computing virtual machine are different, but the function is actually similar, that is, to execute a piece of code.
Xiao Ming:So what does the Ethereum Virtual Machine change? Is there any harm in it doing this?
Michael:He sacrificed performance, and the performance of the Java virtual machine was greatly improved because of the so-calledJust-in-time (JIT, just-in-time compilation), the virtual machine predicts what the next piece of code will be based on the current piece of code. The Java virtual machine is very, very smart, because a lot of money has been spent in it and a lot of optimization has been done. The Ethereum Virtual Machine has never done this in the first place, because no one spent any money to do it before 2016, and everyone thought it was something a child did, which is one of them.
Second, the optimization process is also very complicated. The Java virtual machine can tolerate a small amount of difference in the results from different virtual machines. This is possible. But not in the blockchain, all results in the blockchain must be the same.
Today in the blockchain virtual machine, there are actually three genres. One was changed from Ethereum; then EOS was changed from WebAssembly, which is a JavaScript virtual machine, which existed before EOS, and it removed the things that did not match the blockchain; now I see some changes to Lua Lua is a scripting language that is often used in IoT.
secondary title
Why do some chains need to invent a new language?
As I said at the beginning, some projects will develop new virtual machines, often accompanied by new virtual machine languages. What does this mean?
Divorce for a while, talk about the relationship between virtual machines, virtual machine languages (high-level languages) and compilers.
Simply put, the virtual machine determines the virtual machine language, and the compiler is a tool for writing this language.
translater
As fortranslaterThe common role of , is a bit like translating a language that people can understand into a language that computers can understand.
There are three kinds of computer languages, machine language, assembly language, and high-level language (recommended reading "In-depth understanding of computer systems (3.1)------assembly language and machine language"). Only the first one can be directly recognized and executed by the computer, it usesbinary codeSaid that it is difficult for people to remember. So there is assembly language, which abstracts some instructions for people to remember and understand.
However, both of these are directly facing the hardware/processor, and both are bound to the type of processor. This is not the case with high-level languages, such as Java, C, C++, C#, python, etc., which are highly encapsulated programming languages. Easier to learn also means less flexible. The software source code written by it can also be translated into the language of different processors by the compiler.
It can be seen that "low-level language" may be more "high-level", because it is flexible (the lower the language is, the more powerful the effect is), it is more difficult to master, and fewer people understand it.
Back to the topic, to develop a new language, in fact, it is to create a new virtual machine, to do what the Ethereum virtual machine does not do.
CyberMiles is mainly optimized for vertical scenarios. The CyberMiles virtual machine (CVM) is modified based on the Ethereum virtual machine, so it is compatible with the Ethereum virtual machine language Solidity, and supports some functions that the Ethereum virtual machine does not support. This means that if you already know Solidity, you can write it in Solidity, but if you want to use some functions of CVM, you can write it in Lity, and CVM can "understand" both.
Michael :Lity and CVM, the most important point for CyberMiles is to allow us to do things that Ethereum cannot do. In fact, we are in the Ethereum community. We don't think we are in a competitive relationship with Ethereum, we have a different philosophy from them. They are pursuing another goal, they are pursuing the goal of the world computer, we are not. We admit that we can't do 90% of the things, and we can do 10% of the things. Therefore, we, Lity and CVM, optimize a Turing-complete system of Ethereum for specific scenarios.
Vitalik knew about our approach in 2015, but he was unwilling to do it, because he was unwilling to make Ethereum a virtual machine optimized for e-commerce or cross-chain, and he hoped to find a general solution. This is a matter of worldview. What Ethereum talks about is the world computer. We don’t want to be the world computer. We think that no one can be the world computer. A well-done blockchain is vertical.
secondary title
What optimizations has CyberMiles made based on EVM?
So, where is the optimization of CyberMiles' virtual machine CVM compared with EVM? Some optimizations done by CyberMiles can also help us understand the characteristics of Ethereum.
First, it can be dynamically expanded. The language developed by CyberMiles can be optimized for different business scenarios to improve performance.
According to CyberMiles official introduction: "Lity's flexibility comes from its new language OTA architecture. In this version, the LibENI function is the highlight. Through the new OPCODE in the CyberMiles virtual machine, users can now add more functions to the Lity language .This extension can be dynamically deployed to the CyberMiles blockchain through super node consensus without stopping or upgrading the blockchain software."
Add some notes to the official introduction that you don't understand (see the last reference article for details):
OTA is Over the air, which is to dynamically upgrade the software of the system itself without restarting the virtual machine. A similar technology is also used in the wireless upgrade system of mobile phones.
OPCODE is operation code, operation code, which is a bunch of machine language instruction sets.Simply put, it is the instruction sequence number, which is used to tell the CPU which instruction needs to be executed.. CVM adds OPCODE which the latter does not have in EVM to realize its own functions.
Michael :It is very important that I can add some new functions and features while operating. For example, I have a new encryption algorithm, which is very, very expensive to implement in Ethereum, so I can add it with C++, and I can add things to it when my machine is not down.
As an e-commerce public chain, the virtual machine or language we have developed can be expanded. Others want to make a takeaway public chain or a financial public chain, which is completely usable, which is why we do open source sharing.
Second, install code inspection mode on the Lity compiler and CVM. In brief, the CyberMiles team categorized known security issues caused by Ethereum Solidity smart contracts and extracted common patterns. Through compiler and virtual machine optimizations, Lity can proactively prevent common security issues in Solidity, such as integer overflows and tokens being accidentally transferred to unsupported contract addresses.
Michael :We have a building feature. For example, we just talked about the security vulnerability of BEC, but we forgot to add safemath, so there will be an integer overflow problem, and then hundreds of millions of dollars will be lost. The BEC contract can be compiled in CyberMiles and run on our virtual machine. But the same problem will not happen on our virtual machine, because our virtual machine will check whether the integer is overflowed at runtime and report an error. I think this shows the maturity of EVM. This kind of thing has appeared in Java. Since Java started in 1997, there is no check in Java, and it was added later.
Third, clock-based long-term smart contracts. Lity can execute something that happens in the future.
Michael :There is another big problem in Ethereum: Ethereum contracts cannot be started by themselves. You can't design an alarm clock for a contract to start on a future day. But it often happens in e-commerce, such as remittance and delivery. It is impossible to do this in Ethereum. There is a special thing in Ethereum called Ethereum Alarm Clock. This is an offline organization that collects money and helps you to implement Ethereum contracts at a certain time. We made it into a virtual machine.
This thing sounds like a small thing. If you look at the development history of Java, these things have happened, because in your enterprise application, if you don't have this function, you can't pay in installments.
For example, today Solidity cannot perform decimal calculations, but only integers. Why? Because on different machines, the results of decimal operations are not the same, so there is no way to reach a consensus. Then we have changed this here, and the library of mathematical operations is mostly. If you want to deal with money, decimal calculations are still needed. I think there are of course other features, such as the rule engine mentioned in the white paper.
secondary title
What are the costs of these "optimizations"?
After chatting with Michael, you will actually find out that many CVMs are optimized for vertical industries, and Ethereum is really "can't do it, don't do it". If it is a general computing public chain, some things may be sacrificed to realize these functions, the most direct one is the governance mechanism. The reason why CVM can achieve high performance needs to communicate with the underlying operating system to a certain extent, but it is impossible for Ethereum to require nodes joining the network to use a specific system, but CyberMiles of DPOS can be used; modularization at the virtual machine level, Scalable, also requires the unanimous consent of DPOS super nodes
In other words, the language of CyberMiles is inseparable from its adoption of the DPOS consensus mechanism, which Ethereum is unwilling to adopt.
About virtual machine performance optimization
Xiao Ming:Several of our features this time, LibENI functions, and the code inspection just mentioned, I feel that this is mainly on the virtual machine itself, because the language itself cannot determine whether I am inspected?
Michael :The language includes the grammar of the language and its compiler. The things to be checked are placed in the compiler, which is part of the language system. But LibENI must be implemented in the language domain.Because there is a concept in the field of Ethereum, there is no function that can be added, and Ethereum is clearly written in the white paper. Then I want to add OPSCODE in it, this is a change to the language itself. There are also things like alarm clocks. Alarm clocks also need to master key words in language, so that I can say that I will execute them in three days or ten days.
Xiao Ming:Going back to the question I asked at the beginning, I feel that the higher the level of the virtual machine, the greater the loss.
Michael: Yes, this is one of the reasons why the EVM is inefficient, it is on top of the OS. There are two Ethereum clients, one is Geth and the other is Parity. If you want to run an Ethereum permission, you can either run Geth or run Parity, and download a large data package. After a new transaction comes in and needs to generate a block, your computer’s Ethereum virtual machine has to go through it all. You have to match things on other networks before you can continue to go down, otherwise you will not be on this network. .
Xiao Ming:So our virtual machine is the same?
Dr. Michael Yuan: Same, but our virtual machine has the encryption algorithm LibENI. Why is it called LibENI? ENI refers to the local interface of Ethereum. The underlying operating system is a way that we use C++ language. All of the above are things that can be proved by mathematics. We made a C++ port on it, so if we want to calculate encryption and decryption, we will pass through that port and hit it below the level of the operating system. Of course, the performance is much faster than it.
Xiao Ming:In this way, if our underlying operating system is different, or the underlying hardware is also different, will it not cause the inconsistency you just said?
Michael: Yes, so this is the internal mechanism in LibENI, so when LibENI produces modules, it needs to produce several system modules. For example, Ubuntu Linux, windows and so on. But we have an advantage, because our super nodes are controllable, we have a certain number of super nodes, we can require the super nodes to use these operating systems.
Xiao Ming:Can you summarize LibENI?
Michael: It consists of two pieces. One is the Lity level. What is seen in the virtual machine language needs to be defined. If this thing comes, it needs to be introduced at the C++ level, it will be two pieces. Then let the super node agree. During this process, the super node will check whether it supports its own operating system, otherwise it cannot do it.
Xiao Ming:So we have something fast, LibENI decides?
Michael: It's LibENI adding things to it.Writing DApp is written in Lity and run on virtual machine CVM. But if you want to change the virtual machine, it is another thing called LibENI. LibENI is an extension of Lity, which uses C++ language. It is equivalent to transforming the virtual machine itself. The virtual machine is written in C++, which is equivalent to encapsulating C++ into a module and putting it on the shelf.
Xiao Ming:Since this is an interface to the underlying C++ language, why can't that function be called by everything?
Michael: You mean to modify the entire virtual machine to be modularized like this? Not all functions can use functions. There are rules and other structures in the virtual machine. If we can solve problems with functions, we can. What is a function? There are input and output things. For example, if a private key and a text come out, an encrypted text will come out.
Xiao Ming:Can you give an example that ordinary people can perceive? Can't be done with a function.
Michael: For example, the rules just mentioned are things related to control, such as the cycle from one to ten. The rule engine just mentioned is actually, after this matter is done, do this matter. Each of the things to do here is a function, but the overall structure is not a function.
Xiao Ming:So functions are often single things.
Michael: A single thing is to give an input and an output.
The relationship between language extensibility and governance mechanisms
Xiao Ming:CyberMiles emphasizes that it is customized for e-commerce scenarios, but it also emphasizes that the scalability is relatively strong. I don't quite understand one thing here.
Michael: Because of its strong scalability, the use of the CyberMiles chain and the functions added to the CVM are all related to e-commerce. However, I can use LibENI and CVM to create another public chain, which can be used for food delivery, taxi rides, or securities token issuance. Technically, we provide this, and you can expand CVM according to your own needs.
This is beyond the scope of CyberMiles, because CyberMiles wants to operate an e-commerce public chain and an ecology. However, the software we develop is more versatile than this. This is what I just said. We very much hope that everyone will copy our stuff. If one day people don’t copy your code, everyone will copy mine. I think this is A hallmark of our success.
Xiao Ming:You mentioned that our language is highly scalable, so what kind of language is not very scalable? Or what are the criteria for defining a language as strong or not?
Michael: EVM and Solidity are not scalable, that is, you cannot change them. If you need to change them, you have to update the entire software system. The EVM software is updated approximately once a year. If you change things on the EVM, you basically have to know Vitalik yourself, he does all the core development.
We turned the virtual machine into something that everyone can change. For example, we opened a port in Solidity, and I can let you directly put in the C++ port, so this thing can be done in C++, but there is no way to do it on the Ethernet virtual machine. It cannot be realized on PoW, because you need to update a virtual machine on the software without updating the software. This is a core component in the software, and this requires the consensus of all nodes in the entire network. It is very difficult to require all miners to reach a consensus on Ethereum, but we do DPoS, so we can use super node consensus to do this.
Xiao Ming:This is what you just mentioned. We have a lot of functions that developers can propose and allow nodes to accept.
Michael: We provide a method that can use packaged tools to perceive your pain points. It is processed in our virtual machine. For example, I can do RSA encryption in my virtual machine before. Now I want to do DES encryption, that is Another function. You can pack it into a package and let their super nodes agree, and then encrypt it after agreeing. The plane is flying while changing the engine.
Xiao Ming:So in fact, our scalability is not determined by the language, but by our governance mechanism?
Michael: Our language, virtual machine and governance mechanism work together and are inseparable. Because we can add this kind of keyword in our language, I have a modular method, which can directly add the function of this keyword to the virtual machine, which is currently impossible for Ethereum, because there is no support.
The Agility vs. Security Debate About Virtual Machines
Xiao Ming:Many people criticize that there are many loopholes in the Ethereum smart contract itself, including integer overflow, because the virtual machine itself may not be well designed?
Michael: I think so. People who write virtual machines or languages will say that it is the problem of people who write smart contracts. Your programmers are too poor, so there is this problem. If everyone makes the same mistake, it means that your design of this thing is not good, so you have to find a way to change the design. When we were in Java before, we also encountered this kind of problem. Is this bug caused by a programmer's problem? It is because of the programmer's problem, and because your language does not prevent this problem, so I always thought there was this problem.
This is the difference between engineering solutions and scientific solutions, or the difference between academic thinking and engineering thinking.
I used to be very clear in college that academic thinking would say that the things we design are Turing complete and mathematically provable. For example, integer overflow, why do you say it is a bug? Because my language allows it to be written like this. The programmer wrote it like this to prove that he wanted to do it. However, in real life, 99.99% of the time, the programmer didn't do it, but wrote it wrong.
We did something that Java did before. That is, after the integer overflows, we cannot continue to execute, and our virtual machine reports an error.
Xiao Ming:Did the Java virtual machine allow this before?
Michael: All virtual machines go through this debate: is this a bug or a feature? We allow everyone to do this.
Xiao Ming:The "disadvantage" of being flexible. Since the virtual machine is on top of the operating system, it means that during the operation, the operating system may actually have a virtual machine. Can the lower layer know that this happened? Can't stop it?
Michael: Are you willing to do this for the underlying virtual machine? The lower the virtual machine is, the more likely it is to want to do this. What appears to you to be an integer overflow may not be an integer overflow underneath. So the further down you go, the more you need to give people freedom. So the lower the level of software, the higher the requirements for programmers. For example, if you want to write something in machine language, it is very, very error-prone, but you have the greatest flexibility, you have the greatest optimization ability, and you can use all the resources of the entire system. But when you get to the Java layer, or the EVM layer, many things in your system are simply invisible. You don't even know where the memory is, because it will be abstracted.
Xiao Ming:Assuming that the lower-level virtual machine does not allow a certain behavior, but the upper-level virtual machine allows it, then he wrote it again, will the lower-level block it?
Reference article:
I am Lu Xiaoming, editor of Odaily. I am exploring the real blockchain. Please add WeChat lohiuming for breaking news and communication. Please note your name, unit, position and reason.
Reference article:
Difference between Docker and virtual machine
How are virtual machines different from Docker?
Introduction to Virtual Machines (9) - Summary of Virtual Machines: A brief introduction to the four virtual machine conceptual models in use today
Introduction to virtual machines (4) - the history of virtual machines: Appendix of Major Historical Events of Virtual Machines
History of Virtual Machines: simple virtual machine history
In-depth understanding of computer systems (3.1)------assembly language and machine language
Wikipedia: Instruction Set Architecture
OTA essence and implementation process analysis
OTA production and upgrade process notes
Detailed Explanation of OPCode and Principles of Assembly and Disassembly
Let’s also talk about the defects and deficiencies of the Ethereum virtual machine EVM