NDN Symposium: Questions and Answers about the Soul of IPFS

BTC0.0₂₀

ETH0.0₂₀

HTX0.0₂₀

SOL0.0₂₀

BNB0.0₂₀

BTC0.0₂₀

ETH0.0₂₀

HTX0.0₂₀

SOL0.0₂₀

BNB0.0₂₀

NDN Symposium: Questions and Answers about the Soul of IPFS

IPFSBase

2020-07-15 12:05

本文约3125字，阅读全文需要约13分钟

The latest first-hand questions and answers, everything about IPFS is in IPFSBase.

The Named Data Networking project (hereinafter referred to as NDN) is an information-centric flagship project in the network field. The project is building a network layer, message-centric (i.e. content-addressable) protocol stack. Started about ten years ago, it was funded by NSF and cooperated with 10 universities in the United States.

IPFS and NDN share the same vision of a content-addressable network, but operate in very different ways. NDN is a native network layer model, while IPFS is an application layer model.

Given the shared vision of the project, we are very excited about the showcase session, especially the discussions and feedback. A total of more than 20 people participated in the speech.

Below is a summary of the questions asked during the presentation.

Q: What is the block size? Can users use different size blocks?

A: The default block size used is 256 KB. Yes, users/applications can use the -s (--chunker) option of the ipfs add command to define the chunk size as well as the chunk algorithm.

Q: How is Merkle-Tree created?

A: When a user adds a file to their local IPFS node, a Merkle-DAG structure (known as Interplanetary Linked Data or IPLD) is created locally. When a file is published in the IPFS network, the file will not be replicated in other nodes. This is intentional to avoid adding content to the client's local storage without the client's consent. Instead, the file is initially distributed by user agents, who publish it to the network upon request. At the same time, any node that retrieves from the original file can also act as a provider of material, thereby creating a cache network for the content. When a file is published to the network, a "provider record" is placed in the DHT to point to a local node for retrieval. Files can also be "locked" if other clients in the network wish to become permanent providers of the file. If they don't lock the file, the file will eventually be "garbage collected".

Q: How do files get added to the system from an architectural standpoint? In particular, how do you let the world know what you added and where? Likewise, how do I know that others have been added to the system?

Answer: There is no mechanism in the IPFS architecture to track files published in the network. This has to happen "off-band" if the world is to be made aware of newly added content/CIDs. This topic has a lot to do with ongoing discussions in the IPFS community about a "decentralized search engine", but so far, nothing substantial has come of it. That said, this topic has gotten a lot of attention from Protocol Labs and the community at large. IPNS, and its supported PubSub protocol, is another method of disseminating information about newly published content. Applications can use this option to propagate (i.e. push) information about new content within the domain of the application itself. When IPNS runs on top of DHT, it can also do pull-based work.

Q: Do I need to have the entire structure of the Merkle-Tree in order to retrieve a part of it? If I only want to retrieve part of the file, the CID root seems almost meaningless.

A: Users do not need to have the entire structure of Merkle-DAG to retrieve a part of it. In order to retrieve only a part of the Merkle-DAG (consisting of one or more CID blocks), the user needs to hold those specific CIDs. Also, you can use root CID and path notation to access files in Merkle-DAG, e.g. Qmcri6S86LuivUY4FDcM1phu5REXcFYootxn1GsRoqnFN5/path/to/some/file.png.

Q: Once a block is assigned a CID, is the block immutable?

Answer: Yes, once a block's CID is calculated, it will remain the same forever. As we all know, in version control systems such as SVN and git, this is the basic concept of "permanent Web". We believe this is an important property of storage and delivery systems. Of course, the block itself is not static and can be changed. However, the CID of the new file will not match the old file, so the new version must be added separately (unless the content is published under a public key via IPNS)

Q: How to revoke CID from IPFS network?

A: Such a CID is permanent and cannot be "revoked" as it is a hash of something specific (see comment above on "permanent web"). Users who no longer wish to provide access to certain content can simply stop "providing" that content, in other words, stop publishing the corresponding provider's records. However, this does not mean that the content disappears from the network, as other clients that have retrieved the content may still have it in their cache and serve it. Also, the IPFS gateway provided by Protocol Labs has a CID deny list. , the CIDs in this list are double-hashed to protect their content, and the IPFS Gateway checks to see if the content has been rejected/blocked before serving it.

Q: Is it possible to check the deletion status of a specific CID (i.e. has it been added to the deny list)?

A: To check if a CID is in the deny list for a given gateway, you can try to resolve that CID on the gateway and get an HTTP response code which will inform you if it has been denied. Each deny list is maintained individually by the operating organization - there is no global deny list for the entire IPFS network.

Q: The rejecter list doesn't seem to belong in decentralized infrastructure.

A: Any individual or organization can run a public IPFS gateway and operate their own deny list. In this sense, the (content) of the deny list is not determined by a centralized entity.

Q: Where are IPNS records kept?

A: IPNS uses the same infrastructure as content routing, namely DHT. Multiple hashes of client public keys are registered on the DHT to point to mutable content. Meanwhile, there are other ways of distributing IPNS records: the pubsub protocol (also a specification) called gossipsub is used for this purpose, as a way to distribute IPNS records quickly. As mentioned earlier, the difference between IPNS over PubSub and DHT is the difference between the push (PubSub) and pull (DHT) modes.

Q: How do other nodes know they have the correct key for the name?

A: When a node looks up an IPNS name on the DHT, it retrieves records from all clients specified by the DHT to store the data. Since records have serial numbers, clients can easily determine the most recent value corresponding to an IPNS key. There is also a DHT lookup shortcut, where instead of waiting for the lookup to complete, the user can decide to wait for a quorum Q of receiving records (currently set to Q=16) before making sure it has enough information to determine this newest record.

Q: If the node storing the IPNS record goes offline, the IPNS record is lost and it cannot be served if someone has not updated it within 24 hours?

A: This is correct, and so is the publisher of the IPFS record (i.e. the immutable CID). One of two things can happen: If the content has been requested, and it has already been retrieved and cached by some other node, it can be served by the caching node.

If one (or some) of the clients that have retrieved (and cached) the content decide to keep saving/serving it, they can "lock" it, which means they have become permanent providers of the content.

Q: When caching content, how does the system know what is cached and how to use/parse it?

Answer: Clients that cache content also publish records of providers to the DHT to declare that they are also providers of all content items in their cache.

Q: Is the cached content the same as the original copy of the content?

A: Until the next "garbage collection" date, during which time the cached content can be parsed and served, and expires unless the cached content is "locked" (otherwise, it is permanently copied until the user makes a change). Note that at the time of writing, garbage collection is turned off by default.

Q: You have to know exactly what to look for. DHT is good, but it's hard to know what's in it. Where does the binding between CID and real identity information take place?

A: IPFS is a distributed file system for use in end-user-facing content discovery (e.g. how HTTP is currently used to address/host sites for the Google search service). IPFS manages serving, storing, and fetching content for a specific CID; the rest (connecting users to the CID associated with that app or finding the app in the first place) has to happen a layer above IPFS itself.

Q: At the beginning of the conversation, you mentioned that the purpose is to remove trust from the network (i.e. external entities). Can you elaborate? How can you trust that a certain content will be published with a certain key?

A: Naming content by its own hash and publishing it in a distributed P2P network essentially overcomes some of the problems associated with putting trust in external entities such as content hosting and content resolution entities. The content is self-validating, so it can be verified locally. As long as the content is signed by the publisher's private key, content consumers can verify the authenticity of the content without relying on external entities.

Thanks to everyone who attended this talk, and to NDN for organizing this event and inviting us!