IPFS/Filecoin: The stars and the sea are on the left, and the financial inspiration is on the right
BlockVC
2020-06-12 07:51
本文约10577字,阅读全文需要约42分钟
Why Distributed File Storage Is Critical to the Next Generation Internet.

Editor's Note: This article comes fromBlockVC(ID:blockvcfund), reprinted by Odaily with authorization.

, reprinted by Odaily with authorization.

After the Bitcoin system has been in operation for more than five years, a new batch of blockchain technology-driven innovative projects represented by Ethereum began to emerge. IPFS, the InterPlanetary File System, the Chinese name Interplanetary File System, was also created during this period. Founded in 2014 by Juan Benet. Benet released a draft white paper describing a "peer-to-peer distributed file system that attempts to connect all computing devices together with the same file system." In the same year, Benet also founded Protocol Labs, a research laboratory for network protocols and the parent company of IPFS, and applied to join the prestigious startup incubator Y Combinator, which accepted the Protocol Labs team into its 2014 Summer research. In February 2015, the Alpha version of IPFS was opened to the public for testing; a year later, Protocol Labs completed a $3 million financing from DCG. In 2017, following the ICO wave, with the skyrocketing price of ETH and the huge financing of projects such as EOS, Protocol Labs issued Filecoin, an incentive layer blockchain project supporting the IPFS protocol, and completed a huge public offering of 257 million US dollars on CoinList. According to the information disclosed by Filecoin at the time, as of June 2017, IPFS network providers have stored more than 5 billion unique content files, and the data continues to grow.

Figure 1: Display of Filecoin Primer results, Protocol Labs

After seven years of development, the IPFS storage network has now grown into a global distributed storage option, and the main network of the Filecoin blockchain is tentatively scheduled to be officially launched in July-August 2020 after several delays. The FIL tokens used will also be officially launched. Since 2018, China's blockchain and digital asset industry has always paid close attention to the progress of IPFS technology and ecology, and has developed a prosperous ecology around IPFS and File coin. This research report will explain the technology and application of IPFS from the perspective of the development of distributed storage and the primary technology, and extend it to the incentive logic of Filecoin built on IPFS, showing the ecological landscape around IPFS.

secondary title

one. Distributed Storage Yesterday and Today

The so-called distributed storage means that a large number of ordinary servers provide storage services as a whole through the Internet. This storage method usually has the characteristics of scalability, reliability, high performance, and easy maintenance. In the early days of Internet development, people used e-mail and other applications based on centralized servers and a series of transmission protocols. The records and information generated by each conversation are stored centrally in the single-point server. If one day a single point of failure occurs, then server data cannot be accessed and applications will not function. With the vigorous development of computer systems and various application software, the centralized storage method has gradually changed, and the P2P transmission method has gradually sprouted.

1. Distributed storage in the "pre-blockchain era"

In 2000, a college student from the United States, Shawn Fanning, developed a software called Napster, in which he could download the MP3 files he wanted, and he could also upload his own MP3 files for others to download. What followed was the prototype of the P2P file transfer technology brought by this software. Inspired by this, eMule/eMule and later the famous BT (BitTorrent) protocol were born. Since then, the centralized storage method has undergone tremendous changes, no longer relying on centralized servers, and peer-to-peer file storage and transmission applications are all the rage. Taking BT as an example, it has many advantages as a content distribution protocol - different from centralized servers, the more people download the same content under the BT protocol, the faster the download speed. Using this method, the download server can simultaneously process multiple Download requests for large files without consuming a lot of bandwidth.

The magnetic link is also developed on this basis. It is not based on the IP address or locator of the document, but in the distributed database, through the hash function value to identify, search and download the document, that is, each file content has a unique hash. The hash value is used as the ID, and the distributed hash table technology is used to realize the addressing and storage of the entire network. Because it does not rely on an active host to download files, it is especially suitable for peer-to-peer networks without a central server.

The blockchain itself is a technology that integrates distributed storage, asymmetric encryption, and consensus algorithms. The distributed storage solution based on blockchain technology and the BT protocol are technically similar in that the files to be stored are both encrypted. Fragmentation, and store fragments on each node. In this solution, due to the characteristics of blockchain technology for data encryption and privacy protection, and the non-tamperability brought about by decentralization, it perfectly outperforms centralized servers.

In this type of solution, token rewards at the incentive layer are usually designed—that is, when nodes contribute their storage resources to the network, they can obtain the corresponding ecological tokens of the network. At present, projects using blockchain technology for storage include IPFS, Storj, Sia, etc.

secondary title

two. Application and principle of IPFS technology: the exquisite combination of HTTP transmission protocol, Git version control protocol and BitTorrent transmission protocol

1. Look at the essence of IPFS from the first principle

IPFS is essentially a file sharing and transmission protocol, such as HTTP Hypertext Transfer Protocol, which is one of the most successful and dominant transmission protocols on the modern Internet. However, unlike HTTP, which follows a client-server model, IPFS is a distributed network of nodes that can both serve and request data.

  • In the client-server model, the "client" (usually a web browser) sends requests to the "server" (usually a file hosting server), for web elements (images, web pages, documents, etc.), and returns the The response to the web element it requested, which is displayed on the screen. There are many "agents" (computers and machines) between the client and the server that help relay HTTP messages from both parties.

  • The disadvantages of this model, along with HTTP, are as follows:

Inefficient structure: The structure of HTTP only allows clients and servers to request and serve a single web element at a time. This is why some web pages with multiple web elements take longer to load than others. While HTTP can effectively move smaller files, it is becoming increasingly inefficient and expensive in a world where users generate large islands of data every day.

  • Centralized Network: Because web elements and data are requested from a single host server, it makes these central servers vulnerable to DDOS attacks and data attacks. These centralized networks make themselves open to government or institutional scrutiny. In 2017, in response to the Catalan independence referendum, the Spanish government reviewed several websites and apps related to the referendum.

  • Efficient structure: On IPFS, each node can provide content to node peers or request content from node peers. The more nodes on the network, the more efficient the data transfer. Instead of requesting one web element at a time to complete a web page, clients on IPFS request multiple web elements from multiple nodes. As a highly simplified example, a web page with four web elements using the client-server model would require four client requests and server responses in sequence to fully deliver the page. With IPFS, four client requests will be made to four servers at a time to deliver pages.

Distributed Network: Nodes on IPFS are allocated and have equal power, rather than a single server farm to store and deliver data. If a node is attacked or shut down, data can still be retrieved from other nodes on the network. A distributed network reduces the attack surface for censorship. In response to Spanish government censorship, the Catalan government used IPFS to upload previously banned websites and make them available to the public.

image description

Figure 2: Schematic of Centralized vs. Distributed Networks, The Block Research

It can be seen that through the above two main differences, compared with the HTTP protocol, IPFS realizes permanent storage of information that cannot be tampered with and deleted, and resource sharing. IPFS uses content addressing, does not store duplicate files, and saves storage space. At the same time, distributed features and encrypted storage allow the entire network to avoid single points of failure and provide better privacy and security compared with storage solutions such as Baidu Cloud. In the long run, the new features of the IPFS protocol have significant advantages over HTTP, and the trend of IPFS and HTTP protocols learning from each other and gradually merging will be inevitable.

2. Design inspiration of IPFS

In order to achieve the above characteristics, the technical design route of IPFS integrates a variety of existing distributed systems and storage technologies, and innovates on this basis. It can be regarded as a master of blockchain technology in the field of distributed storage.

Distributed Hash Table: A Distributed Hash Table (DHT) is a distributed system for storing and retrieving "values" or data associated with keys on a network of peer nodes. Each value corresponds to a different key that provides the permissions required to access that value. On a DHT, all information is evenly distributed among nodes, and each node has an equal chance of being selected to store a value and key pair.

IPFS utilizes Kademila DHT, Coral DSHT, S/Kademila's DHT implementation to improve lookup efficiency and reduce attack vectors such as Sybil attacks.

image description

Figure 3: Schematic diagram of a distributed hash table, Wikipedia.org

BitTorrent: BitTorrent is a successful peer-to-peer file sharing protocol. In 2018, the protocol was responsible for 4.16% of global internet traffic. IPFS uses BitTorrent's data exchange protocol to reward resource contribution nodes and punish resource leakage nodes.

Git (Version Control System): A version control system records changes to files over time so that users can always access a specific version. Git has implemented a distributed version control system that enables a distributed set of contributors to track changes to files by multiple parties. IPFS has implemented this control system on top of its protocol. On IPFS, the data stored on the network is divided into multiple pieces, which are kept by nodes on the network. By implementing version control, IPFS ensures that these fragments are properly organized and properly updated for file changes.

SFS (Self-Certifying File System): SFS is a distributed file naming system. IPFS uses it in its Interplanetary Name System (IPNS). Using IPNS, users can generate verifiable addresses that link to IPFS content.

  • 3. Protocol stack of IPFS

  • The IPFS protocol includes the following seven sub-protocols:

  • Identity: An IPFS node receives a NodeId, which is a cryptographic hash of its public key. NodeId helps identify nodes in the IPFS network. When nodes connect, they exchange public keys and check that the cryptographic hash matches the public key. If they match, the connection is established; if not, the node disconnects.

  • Network: IPFS helps manage the connection of peer nodes on its network and provides transmission, reliability, connectivity, integrity and authenticity functions. IPFS is also designed to work on top of other protocols, enabling its annotations to create custom addresses using its multiaddr functionality.

  • Routing: IPFS uses S/Kademlia and Coral based DSHT to route information between node peers. Small data values ​​(equal to or less than 1KB) are stored directly on the IPFS DHT. Larger values ​​store references to nodes that might be able to serve the data.

  • Swap: IPFS uses BitSwap, an exchange function inspired by BitTorrent, to exchange data between nodes. BitSwap is essentially a data market where nodes will provide data blocks to other nodes in exchange for the data blocks they need. If a node doesn't have the data its peers need, it will try to find that data.

  • Objects: IPFS objects contain data and links to other objects. Since data is fragmented and distributed to multiple nodes throughout the network, IPFS uses a Merkle directed acyclic graph for data shards to efficiently compile entire files.

File: IPFS can model the layered architecture of the control file system based on the Merkle directed acyclic graph.

IPFS, which has a solid technology stack and stands on the shoulders of many previous technologies, has been running smoothly for more than four years since its release in January 2015. At present, there are hundreds of applications running on the IPFS protocol, and there are millions of nodes around the world. Unlike many decentralized projects launched in the blockchain era over the past few years, IPFS has proven its utility. According to statistics, in 2019, the IPFS public network increased by 30 times, and Google Chrome, Firefox, D.Tube, IPWB, Ethereum, Origin, etc. joined the IPFS network. Today, IPFS is growing at a rate of 2.5EB/day , to store valid data for the world.

image description

Figure 4: Project Mapping Out using IPFS, The Block Research

secondary title

three. Filecoin: Based on the incentive mechanism and public chain system

As explained above, the IPFS protocol defines how files are stored, retrieved, and transmitted in a distributed system, and can store and share files permanently and decentralized. This is a content-addressable, point-to-point distributed protocol. Although a series of progress has been made at the protocol level, the lack of incentive mechanism has hindered the further development of IPFS.

The Filecoin (token FIL) launched by Protocol Labs in 2017 is an incentive application layer and blockchain public chain system on the IPFS network. Filecoin mainly relies on the distributed storage technology provided by IPFS and the implementation of Ethereum smart contracts. On-chain and off-chain trading markets, efficient node discovery and data transmission implemented by libp2p, etc.

Filecoin stimulates the long-term operation of nodes through storage, retrieval, and block rewards to better serve the entire network ecology. The total circulation of FIL is 2 billion pieces, miners rewards account for 70%, Protocol Labs receives 15%, public and private investors account for 10%, and the foundation retains 5% for ecological development. The issuance speed of FIL is divided into three parts: 50% of miners' rewards will be issued within six years; the foundation and Protocol Labs will be released linearly within six years; investors will be distributed within one year.

From the perspective of FIL token distribution, most of the FIL supply will depend on miner rewards. In addition to investors who participate in the FIL public offering can consider trading tokens after FIL goes online on the exchange, other industry participants mainly participate in the project by participating in Filecoin network mining.

1. Filecoin consensus and mining mechanism

Filecoin adopts a hybrid consensus mechanism: mainly based on the expected consensus EC, supplemented by PoRep, the proof of replication, and PoSt, the space-time proof. The expected consensus is the main block consensus of the Filecoin blockchain, and it is a probabilistic Byzantine fault-tolerant consensus mechanism. The difference from the general Byzantine fault-tolerant consensus is that after each round of leader election, on average, one participant can submit a block.

The expected consensus is fair, unpredictable and verifiable. Filecoin assigns weight to each generated block, and the chain with the largest weight becomes the main chain. The block weight depends on two factors: one is the number of Parents Block; the other is the storage capacity of the chain. Since Filecoin has a mortgage mechanism, miners will tend to mine the chain with the most parent blocks and the highest storage capacity to achieve effective convergence.

In the EC consensus, the possibility of a miner winning the election is proportional to the current storage capacity of the miner, and the storage capacity is measured by proof of replication and proof of time and space. Proof of space-time can use the proof chain and timestamp to prove that the miner has stored data within a certain period of time. Even if the verifier is not online, he can verify that the miner has generated the proof chain within this period of time in the future, effectively preventing temporary data generation attacks.

Proof of replication can prove that data has been successfully created in a specific sector. And protect against Sybil attacks, external data source attacks, and generation attacks. Proof-of-replication prevents miners from using different methods to make their own storage data smaller than the promised storage data in order to obtain additional rewards.

image description

Figure 5: Proof-of-replication defense against attacks, Protocol Labs

In addition, it should be noted that the participation and mining of Filecoin is not only possible with mining machines, but also requires pledged tokens. In order to prevent the loss of customer data, Filecoin has set up a pledge mechanism, which can also be understood as a miner punishment mechanism. When a transaction agreement is generated, the miners will store the data provided by the customer in the network. This area is called a sector. Each sector will contain the content of the stored file and the promised storage time, ensuring that customers can freely use their own stored data during the agreed storage time.

When a miner wants to add a sector to the Filecoin network, he needs to pledge a locked fund. This locked fund consists of two parts: the FIL tokens held by the miners themselves and part of the block rewards. If the miner goes offline within the promised storage time, he loses a portion of the locked funds. If a miner stops storage altogether, he may lose all locked tokens.

2. The storage market and retrieval market in the Filecoin ecosystem

In Filecoin, there are two built-in storage and retrieval markets for miners to participate in mining and bookkeeping. Through the quotation and order grabbing system, it is guaranteed that miners get the tokens paid by users when they provide services. Storage miners provide hard disk capacity in the market to obtain users' storage fees. At the same time, storage miners obtain the right to create new blocks (mining) through space-time proofs and obtain mining rewards; retrieval miners help users retrieve existing blocks from the IPFS network. Saved content address, so as to obtain retrieval mining rewards.

Retrieval Market: The Filecoin Retrieval Market is an off-chain market that provides the data required by customers for retrieval, and retrieval services are provided by retrieval miners. Retrieval miners do not participate in the block generation process, and directly obtain corresponding service fees from the client. The source of retrieval miner data can be from other clients, storage miners, or the IPFS network.

There are four differences between the storage market and the retrieval market: First, there is no time requirement for retrieval miners to store data, and there is no need to generate storage proofs. Second, any user in the network can provide customers with data retrieval services without pledging FIL. Third, retrieval miners have high requirements for broadband bandwidth and fast payment and data transmission, while storage miners have high requirements for storage capacity. Fourth, storage miners can obtain block rewards, transaction fees, and network acceleration fees, while retrieval miners can only obtain retrieval transaction fees.

image description

Figure 6: Filecoin economic model, ipfs.cn

3. Filecoin economic model deduction

Through the above analysis, we can see that the difference between Filecoin and the Bitcoin network mainly lies in the difference in the nature of the service provided: storage is a long-term service, so the Filecoin network requires miners to work stably for a long time to participate in mining to maintain network stability. Like the Bitcoin network, miners can come and go freely. Therefore, Filecoin adopts a network benchmark reward model rather than a time exponential decay model. This makes it difficult for home storage hardware to meet the requirements of participating in Filecoin mining from the perspectives of operation and maintenance, security, and stability. The mining machine of the mine will be falsified, and the threshold for miners will most likely be raised to the level of professional scale.

Separately, the demand for the Filecoin storage market mainly comes from the demand for decentralized storage in the network, while the supply side is storage miners. Traditional centralized storage or cloud service operators such as Amazon or Google have significant scale and technical cost advantages over Filecoin. According to market supply and demand theory, miners need to quote lower prices than centralized storage to survive in the storage market. Therefore, it can be inferred that in the early days of the Filecoin network, the demand in the storage market will be dominated by small and medium-sized customers with high privacy sensitivity and high cost sensitivity. In the long run, there will be a certain anti-correlation between storage quotations and storage market demand. In other words, storage miners may appropriately lower their quotations to obtain more storage orders, increasing the probability of obtaining block generation opportunities, and the source of income will eventually be the rise in the currency price of FIL.

In the real world environment, there is a high probability that Filecoin will not have too much real storage demand in the early stage of its launch, so the income of early participating miners is mainly block rewards, and most of the effective data stored by miners at this stage will be refreshed by themselves. single stored data.

In the early days, the retrieval demand mainly came from blockchain users who demanded valuable information. The content stored in the early IPFS network has not yet been scaled, and the search service also has no advantages compared with the centralized network. With the gradual scale of the IPFS storage market, more and more files can be retrieved by users, and the importance and income of retrieval miners will gradually increase. The price of FIL has little impact on retrieval miners, whose income is only transaction fees and does not include block rewards. The optimal strategy for retrieval miners is to become a storage miner at the same time, and lower the quotation appropriately to obtain more retrieval orders and improve network bandwidth utilization. In this case, the Retrieval Miner can fetch data directly from the client.

When there is a certain amount of real data on the Filecoin network, there will be a demand for retrieval, and miners will have the benefits of retrieval and mining. The benefits of these three parts are complementary. Only the miner nodes with a certain computing power advantage in the early stage will get more storage orders. The more real storage orders there are, the more effective data is stored in the server of the storage miner. In the later stage of retrieval and mining, more retrieval rewards can be obtained accordingly. From this development path, it is not difficult to see a development model similar to the sharing economy in the Internet.

image description

Figure 7: Filecoin economic model, ipfs.cn

4. FIL miners' currency holding strategy and price analysis

According to statistics on the amount of coins produced by Filecoin investors, Protocol Labs, foundations, and miners, it is estimated that the total output of each month in the first six months will be 34.11 million coins, and the total output in the first year will be 365.13 million coins.

From a macro perspective, since all kinds of miners need to pledge FIL to obtain block rewards, and FIL tokens are released linearly and gradually, it is expected that the supply side will grow slowly and the growth rate will decrease exponentially. In the long run, if the service demand on the demand side gradually increases, it determines that all market participants must continue to purchase FIL.

From a micro perspective, the stability of storage and retrieval service prices is important to users. Therefore, the price of storage and retrieval transaction fees needs to be anchored to legal currency or stable currency, regardless of whether FIL rises or falls. When the price of FIL rises, customers will pay less transaction fees and network acceleration fees for FIL units, and the proportion of miners' block rewards to revenue will increase. When the price of FIL falls, customers will pay transaction fees and network acceleration fees of higher FIL units, and the proportion of miners' block rewards to revenue will decrease. Assuming that only service demand factors are considered, when the price of FIL rises, the rate at which customers purchase FIL in the secondary market will slow down, and when the price of FIL falls, the rate at which customers purchase FIL in the secondary market will increase, achieving the function of regulating the secondary market.

As a distributed storage network, Filecoin's core application scenario is the use of commercial storage customers and the mining mortgage of miners, but the former is bound to be difficult to meet the high expectations of today's industry in the early stages of network launch. When the Filecoin storage network is not yet fully developed, large-scale enterprise commercial data storage will consume a lot of time and hardware resources, which means that its large-scale commercial use in the early stages of the mainnet launch is not yet realistic.

Therefore, before the Filecoin network is truly mature and stable and there is a real influx of massive commercial storage demand, the price of FIL currency will be mainly supported by miners who purchase currency as collateral and investors in the secondary market, the growth of miners and storage space, and investment in the secondary market The hype will largely determine the price trend of FIL.

Four. IPFS ecological landscape

As a king-level project that went online in 2015 and completed huge financing in 2017, the IPFS and Filecoin networks will finally enter a new chapter of the mainnet launch this year. In the past few years, the blockchain and digital asset industry has developed a prosperous ecology around IPFS and Filecoin.

image description

Figure 8: IPFS and Filecoin ecological links, BlockVC industry research

In the upstream of the ecology, Protocol Labs and community developers continue to contribute to the development of the network. At the same time, many blockchain projects use IPFS as the storage infrastructure, laying the foundation for the large-scale use of IPFS; the design and manufacture of mining machines are due to the current Bitcoin The special-purpose mining machines used for mining are different and do not require high-tech chip research and development. Therefore, the design and manufacturing threshold of Filecoin mining machines is relatively low. China's low-end manufacturing industry is developed, and the industry has extensive participation. It can be traced back to Filecoin's fundraising. of 2017.

In the past three years, many miners who purchased various types of mining machines in advance have also adopted the increasingly popular cloud computing power mechanism in order to pay back their costs in advance. The power is disassembled in advance and sold in the form of cloud computing power to retail investors who cannot personally participate because the threshold is too high. Individual investors often choose to participate in the top mining pool clusters, because the Matthew effect of Filecoin mining is obvious. From the testnet data, it can be seen that the coin production efficiency of the top 5 mining pools is 4 times that of 6-10 mining pools above.

Figure 9: Filecoin Testnet Dashboard, testnet.filecoi

epilogue

secondary title

The development of IPFS and Filecoin in the past few years can be described as "still holding the pipa half-hidden, only to come out after a long-awaited call." On the one hand, the storage characteristics of the IPFS network have already been recognized by the community and users within a considerable range, and IPFS has already become the best choice for the distributed storage infrastructure in the blockchain world, and is continuing to develop with an upward momentum; on the other hand Under the background of the huge fundraising of Filecoin tokens, the market value of the entire project has been estimated as high as 30 billion US dollars. The hardware standards and the mining economic model have been changed several times. There are still uncertainties and ambiguities. The production and sales of mining machines in the industry are mixed. Filecoin's project is dressed in the cloak of "blockchain" and "distributed storage mining", shouting the slogan of "financial enlightenment", and committing fraud. As the saying goes, some people are down-to-earth, and some people look up to the starry sky. I believe that in the near future, like the wonderful name of the IPFS interstellar file system, IPFS and Filecoin can truly change the storage ecology and contribute to the underlying protocols of the human Internet.

References

At the same time, it should be noted that the mainnet launch time of Filecoin has been postponed many times, and Protocol Labs has not yet convinced the industry that the mainnet can be activated before the end of August. There are still many unclear mechanism issues surrounding Filecoin's mining economic model that need to be further clarified. Whether FIL's currency price performance and liquidity level can support a healthy and operating storage market is still very different. Certainty. Against the above background, investors who are interested in IPFS and Filecoin should still take a cautiously optimistic attitude, keep a close eye on the project's technology and ecological progress, and avoid falling into speculative frenzy.

2.http://ipfs.cn/

3.References

4.Filecoin Primer

5.http://ipfs.cn/news/info-100886.html

BlockVC
作者文库