Smart contract deconstruction (2): code analysis at creation and runtime
猎豹区块链安全
2018-12-10 03:32
本文约5977字,阅读全文需要约24分钟
Let's continue to dive into the underlying code of the smart contract


This article is the second part of a series of articles.

If you haven't read the preface to this article, please do so first,Part 1 Introduction

We are deconstructing the EVM bytecode of a simple solidity smart contract.

Today, let's start using the "divide and conquer" strategy to disassemble the complex code of the smart contract. As I said in the introductory preface, this disassembled code is actually very low-level, but will be relatively readable compared to the original bytecode.

Please make sure that you have followed the operation I introduced in the preface and deployed the BasicToken code in the remix compiler.

Now,

Now,letWe focus on JUMP, JUMPI, JUMPDES, RETURN and STOP operations, and ignore all other operations. Whenever we find an opcode that isn't one of these, we just ignore it and skip to the next instruction without meddling with them.

When the EVM executes code, it is in a top-down order, there are no other entry points in the code, and execution always starts from the top. JUMP and JUMPI can make the code jump. JUMP takes the topmost value on the stack and will execute the instruction that moved to that location. However, the target location must contain the JUMPDEST opcode, otherwise the execution will fail. The only purpose of this is: JUMPDEST marks the location as a valid jump target. JUMPI is also exactly the same, but there must be no "0" in the second position of the stack, otherwise there will be no jump.So this is a conditional jump, STOP is an instruction to completely stop the smart contract, RETURN is to suspend the execution of the smart contract, but return a part of the data in the EVM memory, which is very convenient

So, let's start explaining the code with all of this in mind. In Remix's debugger, slide the slider for "transaction" all the way to the left. You can use the Step Into button (looks like a small down arrow) and follow the instructions.


The previous instructions can be ignored, go directly to the 11th instruction, we found the first JUMPI. If it doesn't jump, it will continue through instructions 12 through 15 and eventually enter REVERT, whereupon execution will stop. But if it jumps, it will skip those instructions to location 16 (hex 0x0010, which is pushed onto the stack at instruction 8). Instruction 16 is a JUMPDEST.

Continue stepping through opcodes until the "transaction" slider is all the way to the right. A lot of blah just happened, but the RETURN opcode is only found in position 68 (and the opcode in STOP instruction 69, just in case). This is very strange. If you think about it, the control flow of this smart contract will always end at instruction 15 or 68. We've just finished it and determined there are no other possible flows, so what are the remaining instructions? (If you swipe the Directives panel, you'll see the code ends at position 566).

The set of instructions (0 to 69) we just walked through is what is called the "creation code" of the contract. It never becomes part of the smart contract code itself, but is only executed once by the EVM during the transaction that created the smart contract. As we will soon discover, this code is responsible for setting up the initial state of the created contract and returning a copy of its runtime code. The remaining 497 instructions (70 to 566),As we can see, the execution flow never reaches, and it is this code that will be part of the deployed smart contract

first level title


create section

Now, we'll dig into the creation part of the code.

Figure 1. Deconstruction of BasicToken.sol's creation-time EVM bytecode


This is the most important concept to understand in this article. The creation code is executed in a transaction, which returns a copy of the runtime code, which is the actual code of the smart contract. As we'll see, constructors are part of the creation code, not the runtime code. The smart contract's constructor is part of the code that was created; once deployed, it will not appear in the smart contract's code.

How does this magic happen? This is what we will analyze step by step now.

OK So now our problem is reduced to understanding these 70 instructions that correspond to the creation-time code.

Let's go back to the top-down approach, this time knowing all the instructions instead of skipping any. First, let's focus on instructions 0 to 2 that use the PUSH1 and MSTORE opcodes.

Figure 2. Free memory pointer EVM bytecode structure


PUSH1 simply pushes one byte onto the top of the stack, while MSTORE grabs the last two items from the stack and stores one of them in memory:

mstore(0x40, 0x80)

| |

| What to store.

Where to store.

(in memory)

NOTE: The code snippet above is Yul-ish code. Notice how it consumes elements from the stack from left to right, always consuming the element at the top of the stack first.

This is where the number 0x80 (128 decimal) is stored at location 0x40 (64 decimal).

In the matter we are discussing now, leave it alone, if there must be a reason, I will explain it later.

Now, open the Stack and Memory panels in Remix's Debugger tab so that you can visualize these instructions as you step through them.

You might be wondering: what happened to instructions 1 and 3? PUSH is the only EVM instruction consisting of two or more bytes. So, PUSH 80 is two instructions. So we solved the mystery: instruction 1 is 0x80, and instruction 3 is 0x40.

Next I will explain the instructions from 5 to 15.

Figure 3. Non-payable check EVM bytecode structure.


Here, again, are a bunch of new opcodes: CALLVALUE, DUP1, ISZERO, PUSH2, and REVERT. CALLVALUE pushes the number of wei involved in the create transaction, DUP1 copies the first element in the stack and ISZERO pushes 1 to the stack if the highest value of the stack is zero, PUSH2 is like PUSH1 but it pushes two bytes to the stack, while REVERT stops execution.

So what's going on here? In Solidity, we can write this assembly like this:

if(msg.value!= 0)revert();

This code isn't actually part of our original Solidity source, but was injected by the compiler because we didn't declare the constructor as payable. In recent versions of Solidity, functions that are not explicitly declared payable cannot receive ether. Returning to the assembly code, JUMPI at instruction 11 will skip instructions 12 to 15, or jump to 16 if there is no associated ether. Otherwise, REVERT will execute with two arguments as 0 (meaning no useful data will be returned).

OK! Let's take a break for coffee.

(The next part is going to be a little tricky, so it's best to take a few minutes break. Before you get back to focusing again, get yourself a good cup of coffee. Make sure you understand what we've seen so far, because the next part is a bit complicated.)

If you want another way to visualize what we just finished, try this simple tool I built: solmap. It allows you to compile Solidity code on the fly, then click on an EVM opcode to highlight the relevant Solidity code. Disassembly is a bit different than Remix, but you should be able to understand it by comparison.


It's coffee time!


Ready to move on? Next are instructions 16 to 37. Please continue to use Remix's debugger. (Remember, remix is ​​your best friend ^^).

Figure 4. EVM bytecode structure for retrieving constructor parameters from code appended at the end of the smart contract bytecode

The first four instructions (17 to 20) read whatever is in memory at location 0x40 and push it onto the stack. If you recall, that should be the number 0x80. The following pushes 0x20 (decimal 32) onto the stack (instruction 21), and copies that value (instruction 23), pushes 0x0217 (decimal 535) (instruction 24), and finally copies the fourth value (instruction 27), which should be 0x80.

When looking at an EVM instruction like this, it's okay to lose sight of what's going on for a while. Don't worry, it will pop into your head from time to time.

In instruction 28, CODECOPY is executed,It takes three arguments: the target memory location where the copied code is stored, the instruction number to copy from, and the number of bytes of code to copy.So, in this case, 0x80 starts at the byte position located in the code (535, the target position of the 32-byte code length).

If you look at the entire disassembly code, there are 566 instructions. Why is this code trying to copy the last 32 bytes of code? In fact, when deploying a contract that contains a parameterized constructor, the parameters are appended to the end of the code as raw hexadecimal data (scroll down the Description panel to see this). In this case the constructor takes a uint256 parameter, so all this code does is copy the parameter into memory from the value appended at the end of the code.

These 32 instructions don't make sense as disassembled code, but they are represented in raw hex: 0x0000000000000000000000000...0000000000000000000002710. Of course, this is the decimal value 10000 we passed to the constructor when deploying the smart contract!

You can repeat this part in Remix step by step, making sure you understand what just happened. The end result should be the location 0x00..002710, see the number 0x80 in memory.

Well, before moving on to the next part, I suggest taking a break with a whiskey.


Whiskey hour!

Why suggest you have a glass of whiskey, because from here, it's all downhill.

The next set of instructions are 29 to 35, which update the value 0x80 at memory address 0x40 to the value 0xa0, as you can see, they offset the value by 0x20 (32) bytes.

Now we can start to understand instructions 0 to 2. Solidity keeps track of something called a "null pointer": a place in memory where we can store something, guaranteed that no one will overwrite it (unless we make a mistake). So, since we stored the number 10000 in the old free memory location, we update the free memory pointer by moving it forward by 32 bytes.

Even seasoned Solidity developers get confused when they see a "free memory pointer" or code, mload(0x40, 0x80), which just say, "Whenever we write a new entry, we will start from this point Start writing to memory and keep offset records".

Every function in Solidity, when compiled to EVM bytecode, will initialize this pointer.

What's in the memory between 0x00 and 0x40, you may not know. No. A section of memory reserved by Solidity, which calculates the hash value and, as we will see shortly, is necessary for maps and other types of dynamic data.

Now, in instruction 37, MLOAD reads location 0x40 from memory and basically downloads our value of 10000 from memory to the stack, where it will be new and ready for use in the next set of instructions.

This is a common pattern in the EVM bytecode generated by Solidity: before the function body is executed, the parameters of the function are loaded onto the stack (wherever possible) so that upcoming code can use them - which is exactly what will happen next Things that happened.

Let's continue with descriptions 38 to 55.

Figure 5. The main EVM code of the constructor.


These instructions are nothing more than the body of the constructor: that is, the Solidity code:

totalSupply_ = _initialSupply;

balances[msg.sender] = _initialSupply;

The first four instructions are pretty obvious (38 to 42), first, a 0 is pushed onto the stack, then the second item on the stack is copied (this is our 10000 number), then the number 0 is copied and pushed onto the stack, which is Location slot totalSupply_ in storage. Now SSTORE can use these values ​​and still keep below 10000 for future use:

sstore(0x00, 0x2710)

| |

| What to store.

Where to store.

(in storage)

look! We store the number 10000 in the variable totalSupply_. Isn't it amazing??


Be sure to visualize this value in Remix's Debugger tab. You can find it in the store fully loaded panel.

The next set of instructions (43 to 54) is a bit trickier, but basically deals with storing the key msg.sender of 10000 in the balances map. Before proceeding, make sure you understand this part of the Solidity documentation, which explains how to keep maps in memory.

In a nutshell, it connects the slot of the mapped value (in this case the number 1, since it is the second variable declared in the smart contract) with the key used (in this case msg.sender, via opcode to get CALLER), then take the digest with the SHA3 opcode and use that as the destination in memory. In the end, storage is just a simple dictionary or hash table.

Continuing with instructions 43 to 45, the msg.sender address is stored in memory (this time at location 0x00), and then in instructions 46 to 50, the value 1 (the mapped slot) is stored at memory location 0x20. Finally, the SHA3 opcode computes the Keccak256 hash of anything in memory from location 0x00 to location 0x40 - i.e. the concatenation of the mapped slot/location with the key used. This is exactly where the value 10000 will be stored in our map:

sstore(hash..., 0x2710)

| |

| What to store.

Where to store.

At this point, the body of the constructor has been fully executed.

All of this can be a bit overwhelming at first, but it's a fundamental part of working in Solidity. If you don't get it, I suggest you follow through with Remix's debugger a few times, keeping the stack and memory panels.

Also, feel free to ask the following questions. This pattern is commonly used in the EVM bytecode generated by Solidity, and you will quickly learn to recognize it easily. In the end, it just calculates where in memory to hold the value for a certain key of the map.

Figure 6. Runtime code replication structure


In instructions 56 to 65 we perform code duplication again. Only this time, we don't copy the last 32 bytes of the code into memory; we copy 0x01d1 (465 decimal) bytes starting at location 0x0046 (70 decimal) to memory at location 0. That's a huge chunk of code to replicate!

If you slide the slider all the way to the right again, you'll notice that position 70 is right after our build-time EVM code, where execution stops. The runtime bytecode is contained within those 465 bytes. This is the part of the code that will be saved in the blockchain as the runtime code of the smart contract, the code will be the code that is executed every time someone or something interacts with the smart contract.(We'll cover runtime code in a later part of this series).

That's exactly what instructions 66 to 69 do: return the code we copied to memory.

Figure 7. The runtime code returns the EVM bytecode structure.


RETURN grabs the code copied to memory and hands it to the EVM. If this creation code is executed in the context of a transaction to the 0x0 address, the EVM will execute the code and store the return value as the runtime code of the created smart contract.

By now, our BasicToken code will create and deploy a smart contract instance, ready to use with its initial state and runtime code. If you take a step back and look at Figure 2, you will see that all of the EVM bytecode structures we analyzed are generic, except for the one highlighted in purple: that is, they will be generated by the Solidity compiler at creation time bytecode. Constructor differs from constructor only in the purple part - the actual body of the constructor. The structures that take parameters embedded at the end of the bytecode, and that copy the runtime code and return it, can be thought of as boilerplate code and generic EVM opcode structures. You should now be able to view any constructor, and you should have a general idea of ​​the components that make it up before following the instructions.

In the next post in this series, we'll cover the actual runtime code, starting with how to interact with the smart contract's EVM code at different entry points. Now, give yourself a well-deserved pat on the back because you just digested the most difficult part of the series. You should also have a strong ability to read and debug EVM bytecode, understand common structures, and most importantly, know the difference between build-time and runtime EVM bytecode. This is what makes contract constructors so special in Solidity.

Cheetah blockchain security is based on the technology of Kingsoft Internet Security, combined with artificial intelligence, nlp and other technologies, to provide blockchain users with ecological security services such as contract audit and sentiment analysis.


*This article was first published on medium by Alejandro Santander, translated and organized by Cheetah Blockchain*

Cheetah blockchain security is based on the technology of Kingsoft Internet Security, combined with artificial intelligence, nlp and other technologies, to provide blockchain users with ecological security services such as contract audit and sentiment analysis.

Ratingtoken official website https://www.ratingtoken.net/?from=z


猎豹区块链安全
作者文库