
What is DA data availability?
Data availability addresses the question: Has this data been released? Specifically, when a node receives a new block that is about to be added to the chain, it verifies the availability of the data. Nodes will try to download all transaction data for the new block to confirm its availability. If a node is able to download all transaction data, then it has successfully verified data availability, proving that the block data has indeed been published to the network.
As you can see, modular blockchains such as Celestia (learn more at this link:https://docs.celestia.org/learn/how-celestia-works/data-availability-faq) utilizes additional primitives to allow nodes to more efficiently verify data availability. Data availability is critical to the security of any blockchain as it ensures that anyone can inspect the transaction ledger and verify it. Data availability is particularly problematic as blockchain scales. As blocks get larger, it becomes impractical for the average user to download all the data, so users can no longer verify the authenticity of the chain.
What are the data availability issues?
This problem occurs when the transaction data of the new block cannot be downloaded and verified by nodes on the network. One possible scenario is that the producer of the block deliberately does not publish the transaction data, which is called a data concealment attack. If transaction data is not published, nodes on the network cannot confirm and accept new blocks, causing the process of updating the blockchain to the latest state to be interrupted.
Since nodes cannot verify the data for new blocks, this could cause the blockchain to stop running, or worse, attackers could exploit this vulnerability to steal funds. The severity of the consequences will depend on the type of blockchain (L1 or L2) and whether data availability remains on-chain or off-chain. Data availability issues are particularly common in Layer 2 scaling solutions, such as rollups and validitys. These technologies attempt to improve blockchain performance by scaling processing power on-chain, but this can also introduce new data availability challenges.
How do nodes verify data availability in Celestia?
In most blockchains, nodes verify the availability of data by downloading all transaction data for a block. If nodes are able to download all data, then they have verified the availability of the data. In Celestia, light nodes can use a new mechanism to verify data availability without downloading all the data for a block. This new method of verifying data availability is called data availability sampling.
What is data availability sampling?
Data availability sampling is a mechanism that allows light nodes to verify data availability without downloading the entire data for a block. Data Availability Sampling (DAS) works by having light nodes conduct multiple rounds of random sampling to obtain small portions of block data. As a light node completes more rounds of data sampling, its confidence in data availability increases. Once a light node successfully reaches a predetermined confidence level (e.g. 99%), it will consider the block data to be available.
Want a simpler explanation? Check out this discussion thread to learn how data availability sampling is like flipping a coin.https://twitter.com/nickwh8te/status/1559977957195751424
What are some of Celestias safety assumptions regarding data availability sampling?
Two security assumptions for Data Availability Sampling (DAS) in the Celestia blockchain network:
Assumptions about the number of light nodes: Celestia assumes that there are a sufficient number of light nodes in the network that sample data availability for newly generated blocks. This is a mechanism based on random sample checking of data, that is, light nodes do not need to download the entire block of data, but download a portion of the data to verify whether the entire block of data is available. This assumption ensures that if the block data is indeed published to the network, the full node can reconstruct the complete block by aggregating the data parts sampled by the light nodes. Under this assumption, for larger blocks, more light nodes are required to ensure data availability.
Connect to honest full nodes: The second assumption is that each light node is able to connect to at least one honest full node. This is done to ensure that light nodes receive fraud proofs of incorrectly erasure-coded blocks. Fraud proof is a security mechanism used to verify that block data was processed correctly. If a light node cannot connect to at least one honest full node during an eclipse attack (a type of network attack in which an attacker attempts to isolate a target node so that it can only connect to malicious nodes), it will be unable to verify whether blocks are improperly constructed and the security and reliability of the network can be compromised.
Why is block reconstruction necessary for security?
In blockchain,"reconstruction block"This means that if we do not obtain the data of the entire block at once, we can also restore the complete block content through the data fragments we already have. Its like we have a torn piece of paper in our hands. If each torn part has some overlap, it is possible to put the entire paper back together.
In a system like Celestia, through erasure coding, even if we do not get the complete block data, as long as there are enough data fragments, the entire block data can be restored. Data erasure coding creates some additional data redundancy, so that even if part of the data is lost, there is enough information left to reconstruct a complete block.
Why is this important for security? Because this ensures that even in imperfect situations - such as malicious nodes trying to hide data or network instability preventing complete transmission of data - we can still verify the integrity and correctness of transactions. If someone tries to tamper with or hide transaction data, as long as we can reconstruct the block, we can discover and prove the tampering, ensuring the transparency and trustworthiness of the entire blockchain.
What is data storage? Whats the problem with data storage?
Data storage involves the ability to store and access past transaction data.
Data storage and retrieval are required for multiple purposes, such as:
Read information about previous transactions
sync node
Index and serve transaction data
Retrieve NFT information
The issue with data storage is whether past transaction data can be stored and successfully retrieved later. Failure to retrieve historical transaction data could lead to issues such as users not being able to access information about their past transactions, or nodes being unable to sync data from the genesis block. Fortunately, the assumptions about storing and accessing past data are undemanding. Users only need to be able to access a single copy of the blockchain history to obtain historical transaction data. In other words, data storage security is a 1-to-N honesty assumption.
What is the difference between data availability and data storage?How does blockchain state fit into this problem?
Data availability is about verifying whether transaction data for new blocks is publicly available. In contrast, data storage involves storing and accessing past transaction data from older blocks.
So far we have discussed transaction data, but blockchain state is a related topic. Status is different from transaction data. Specifically, the state is like a current snapshot of the network, including account balances, smart contract balances, and validator set information. Problems caused by state size are qualitatively different from data availability and retrievability issues.
Why does Celestia discourage storing historical data? If there were no rewards, who could possibly store the historical data?
Most blockchains discourage data storage because it should not be the blockchains responsibility to ensure that historical data is permanently retrievable. Furthermore, the data storage issue requires only one party to store and provide data to users, which is not a strong issue. Therefore, Celestias goal is to provide a secure and scalable way to verify the availability of data. Once the data is verified as available, the task of storing and retrieving historical data is left to other entities that need the data. Fortunately, even if Celestia itself does not directly provide incentives (e.g., through payment of tokens or other rewards) to encourage the storage and retrieval of data, there are other factors that encourage certain organizations or individuals to store historical data for their own benefit. , and make it available to users who need it.
There are many types of actors for whom historical data may be stored. Some of these include:
Block explorer that provides access to past transaction data.
An indexer that provides API queries for past data.
Applications or Rollups that require historical data for some processing.
Users who want guaranteed access to their transaction history.
What can blockchain do to provide stronger data retrieval guarantees?
Nodes are rewarded based on the amount of transaction data they store and the data requests they serve (this is the case with some data storage blockchains, such as Filecoin).
Publish transaction data to a data storage blockchain that incentivizes storage and services historical data requests.
Reference links:https://docs.celestia.org/learn/how-celestia-works/data-availability-faq
read more:
From single chain to modular: How Avail is changing blockchain application development
How can a novice install and run a Celestia light node in 30 seconds?
Take a quick look! 7 misconceptions and truths about modular blockchain!