Deconstructing Smart Contracts (3): Function Selector

BTC0.0₂₀

ETH0.0₂₀

HTX0.0₂₀

SOL0.0₂₀

BNB0.0₂₀

BTC0.0₂₀

ETH0.0₂₀

HTX0.0₂₀

SOL0.0₂₀

BNB0.0₂₀

Deconstructing Smart Contracts (3): Function Selector

猎豹区块链安全

2018-12-17 06:15

本文约4218字，阅读全文需要约17分钟

Open the underlying code of the "function selector" section of the smart contract together

HI, welcome to continue deconstructing smart contracts with me. This article is the third part of a series, so if you haven't read the previous articles, please take a look:

（1）Preface: basic code and operation method;

（2）Creation and runtime code parsing;

We are deconstructing the EVM bytecode of a simple Solidity smart contract.

In the previous article, we determined that the bytecode of the smart contract is divided into two parts: creation and operation, and we know why we do this. After a deep understanding of the creation part, it is time to start our exploration of the runtime part up. If you look at the deconstruction diagram, we can first look at the second big split block called BasicToken.evm (runtime).

This might seem a bit scary, since the code to run is at least four times the size of the code to create! But don't worry, the skills we've developed for understanding EVM code in previous articles, combined with our use of the rock-solid "divide and conquer" strategy, will make this challenge more systematic and possibly even easier. This is just the beginning, we will continue to identify independent structures, and continue to break down until it is broken down into solvable problems.

First, let's go back to the Remix online editor and start a debugging session with runtime bytecode. how do we do it Last time, we deployed the smart contract and debugged the deployment transaction. This time, we'll use one of the functions to interface with the deployed smart contract and debug the transaction.

First, let's recall our smart contract:

We enabled Javascript VM with optimizing compiler, v0.4.24 version, and 10000 as initial supply. After deploying the smart contract, you should see it listed in the "Deployed Contracts" section of the "run" panel on Remix. Click it to expand to see the interface of the smart contract.

What is this interface? It is a list of all public or external methods in a smart contract −That is, any Ethereum account or smart contract can interact with it. Private and internal methods will not be shown here, how to interact with specific parts of the smart contract runtime code will be the focus of this article's deconstruction.

We can try it out by clicking the totalSupply button in Remix's "run" panel. You should see a response immediately below the button, which is what we expected since we deployed the smart contract as the initial token supply. Now, in the Console panel, click the Debug button to start a debug session with this particular transaction. Note that there will be multiple Debug buttons in the Console panel; make sure you are using the latest version.

In this case, we did not debug the transaction to this 0x0 address, as we saw in the previous post, a smart contract was created. Instead, we are debugging transactions to the smart contract itself - i.e. its runtime code.

If the Instructions panel pops up, you should be able to verify that Remix lists the same instructions as in the BasicToken.evm (runtime) section of the deconstruction graph. If they don't match, something went wrong. Try starting over and make sure you're using the correct settings as above.

The first thing you'll probably notice is that the debugger puts you at instruction 246 and the transaction slider is at about 60% of the bytecode. Why? Because Remix is a very generous program, it takes you directly to the part of the EVM that is about to execute the body of the totalSupply function. However, a lot happened before that, and it's what we're here to note. In fact, we won't even look at the execution of the function body in this article.Our only concern is how the Solidity-generated EVM code routes incoming transactions, which is the job we will understand as the "function selector" of the contract.

So grab the slider and drag it all the way to the left so we start at instruction zero. As we saw before, the EVM always executes code from instruction 0, no exceptions, and then flows through the rest of the code. Let's go through opcodes to do this execute opcode.

The first structure that comes up is one we've seen before (actually we'll see a lot):

Figure 1. Free memory pointer

This is something Solidity-generated EVM code will always do before calling any: save a point in memory for later use.

Let's see what happens next:

Figure 2. Calldata length check.

If you open Remix's Stack panel in the Debug tab and skip instructions 5 through 7, you'll see that the stack now contains numbers twice. If you're having trouble reading these extra long numbers, take care that you adjust the width of the Remix debug panel so that the numbers fit nicely on a single line. The first one comes from a regular push, but the second one is the result of executing an opcode that, as the yellow paper says, takes no arguments and returns the size of "the input data in the current environment", or what we often call calldata: 4CALLDATASIZE

What is calldata? As explained in Solidity's documentation ABI specification, calldata is an encoded block of hexadecimal numbers that contains information about the smart contract function we want to call, and its parameters or data. Simply put, it consists of a "function id", which is generated by hashing the signature of the function (truncated to the first four bytes) and then compressing the parameter data. You can study the documentation links in detail if you want, but don't worry about how this wrapper works down to the finest detail. It's explained in the docs, but it's a bit difficult to grasp once in a while. It's much easier to understand it with practical examples.

Let's see what this calldata is. Open the Call Data panel in Remix's debugger and look at: 0x18160ddd. This is exactly four bytes generated by applying the algorithm on the keccak256 function signature as a string"totalSupply()"and perform said truncation. Since this particular function takes no parameters, it's just that: a four-byte function id. When CALLDATASIZE is called, it just pushes the second 4 onto the stack.

Then, instruction 8 LT is used to verify that the calldata size is less than 4. If yes, the following two instructions execute JUMPI instruction 86 (0x0056). That's less than four bytes, so there won't be a jump in this case, and execution flow will continue to instruction 13. But before we do that, let's assume we call our smart contract with an empty calldata - also That is, 0x0 instead of 0x18160ddd. You can't do that with Remix btw, but you can build transactions manually.

In this case, we end up in instruction 86, which basically pushes a few zeros onto the stack and feeds them to the REVERT opcode. Why? Well, because this smart contract has no fallback functionality. If the bytecode doesn't recognize the incoming data, it diverts the flow to a fallback function, and if that structure doesn't "catch" the call, then this recovery structure terminates execution, with absolutely no rollback.If there is nothing to return, then there is nothing to do and the call is fully restored.

Now, let's do something more interesting. Go back to Remix's Run tab, copy the Account address, and use it as an argument to call balanceOf instead of totalSupply to debug the transaction. This is a brand new debugging session; let's forget about totalSupply for a moment. Navigating to instruction 8, CALLDATASIZE now pushes 36 (0x24) onto the stack. If you look at the calldata, that's now 0x70a08231000000000000000000000000ca35b7d915458ef540ade6068dfe2f44e8fa733c.

This new calldata is actually very easy to disassemble: the first four bytes 70a08231 are the hash of the signature, followed by"balanceOf(address)"The 32 bytes contain the address we pass as an argument. Why 32 bytes, if Ethereum addresses are only 20 bytes long, curious readers might ask? The ABI always uses 32-byte "words" or "slots" to hold arguments used in function calls.

Continuing in our balanceOf calling environment, let's pick up where we left off at instruction 13, with nothing on the stack at this point. Instruction 13 then pushes 0xffffffff onto the stack, and the next instruction pushes the 29 byte long 0x000000001000...000 number onto the stack. We'll see why in a moment. For now, just notice that one contains four bytes and the other contains four bytes of 0's'.

Next CALLDATALOAD takes an argument (the one pushed onto the stack at instruction 48) and reads a 32 byte chunk from the calldata at that location, which in this case in Yul would be:

calldataload（0）

Basically pushing my entire calldata onto the stack. Now comes the fun part. DIV consumes two parameters from the stack, takes the calldata and divides it by the weird 0x000000001000...000 number, effectively filtering everything but the function signature in the calldata and leaving it on the stack: 0x000...000070a08231. The next instruction uses AND, which also consumes two elements on the stack: our function id and the number f with four bytes. This is to ensure that the signature hash is exactly eight bytes long, and to mask out anything else if there is any. I think, the security measures used by Solidity.

In short, we just check if the calldata is too short, if so, revert, and then improve it a bit so we have our function on the stack,

Plus, we're almost done. The next part will be easy to understand:

Figure 3. Function selector

At instruction 53, the code pushes 18160ddd (function id totalSuppy) onto the stack, then uses a DUP2 to duplicate the incoming calldata value of 70a08231 that currently resides in the second position on the stack. Why copy? Because the opcode at EQ instruction 59 will consume two values from the stack, and we want to keep the 70a08231 value since we went through the trouble of extracting it from calldata.

The code will now try to match the function id in calldata with one of the known function ids. Since 70a08231 comes in, it won't match 18160ddd, skipping JUMPI at instruction 63. But it will match on the next check and jump to JUMPI at instruction 74.

Let's take a moment to observe how these equality checks are performed on each public or external function of a smart contract.This is the core of function selectors: acting as a sort of switch statement that simply routes execution to the correct part of your code. This is our "hub".

So, since the last case is a match, the execution flow brings us to JUMPDEST at position 130, which, as we'll see in the next part of this series, is an ABI "wrapper" for the balanceOf function. As we will see, this wrapper will be responsible for unwrapping the transaction's data for consumption by the function body.

Continue to try to transfer this debugging function. There's really no mystery to feature selectors. It's a simple but effective structure that sits at the door of every contract (at least all those compiled from Solidity) and redirects execution to the appropriate place in the code. This is how Solidity provides the smart contract's bytecode with the ability to simulate multiple entry points, and thus interfaces.

Looking at the deconstruction diagram, this is what we just deconstructed:

Figure 4. The function selector and the main entry point for the runtime code of the smart contract.

All in all, guys, before you know it, you are more familiar with the underlying code of solidity than most people, stick with it, and you will be able to fully open it.

*This article was first published on medium by Alejandro Santander, translated and organized by Cheetah Blockchain*

Cheetah blockchain security is based on the technology of Kingsoft Internet Security, combined with artificial intelligence, nlp and other technologies, to provide blockchain users with ecological security services such as contract audit and sentiment analysis.

Ratingtoken official websitehttps://www.ratingtoken.net/?from=z

猎豹区块链安全

作者文库