Storage proofs: Achieving state awareness across time and chains

LongHash Ventures
16 min readOct 11, 2023

We would like to thank Kacper Koziol and Tiago Neto from Herodotus, Mo Dong from Brevis, Ismael Hishon-Rezaizadeh from Lagrange Labs, Yi Sun from Axiom and Roy Lu from LongHash Ventures for reviewing and providing their valuable feedback

Introduction

What if you lost your memory every hour? And you need to constantly ask someone to tell you what you have done? That is the current state of smart contracts. On blockchains like Ethereum, smart contracts cannot directly access states beyond 256 blocks. This problem is further exacerbated in the multi-chain ecosystem, where retrieval and verification of data across different execution layers is even more difficult.

In 2020, Vitalik Buterin and Tomasz Stanczak proposed a way to access data across time. While the EIP has become stagnant, its need has resurfaced in the roll-up-centric multi-chain world. Today, storage proofs have emerged as a frontier, to give awareness and memory to smart contracts.

Accessing on-chain data

There are a number of ways in which dapps can access data and state. All of the approaches require the application to place trust in humans/entities or crypto economic security or code and have some tradeoffs:

Trust in humans/entities:

  • Archive nodes: Operators could run an archive node themselves or rely on archive node service providers like Alchemy or Infura to access all the data since the Genesis block. They provide all the same data as a Full Node but also all the historical state data of the entire blockchain. Off-chain services like Etherscan and Dune Analytics use archive nodes to access on-chain data. Off-chain actors can attest to the validity of this data, and on-chain smart contracts can verify the data was signed by a trusted actor/committee. The integrity of the underlying data is not verified. This approach requires the dapp to trust that the archive node service provider is running the infrastructure correctly and without any malicious intent.

Trust Crypto economic security:

  • Indexers: The indexing protocol organizes all the data on the Blockchain, allowing developers to build and publish open APIs that applications can query. Individual indexers are node operators that stake tokens to provide indexing and query processing services. However, disputes can occur when the data served is incorrect and the arbitration process can take time. Moreover, data from indexers like The Graph cannot be directly utilized by the business logic of smart contracts and is used in the web2-based data analytic context.
  • Oracles: Oracle service providers use the data aggregated from many independent node operators. The challenge here is that the data available from Oracles might not get updated frequently and is limited in scope. Oracles like Chainlink usually only maintain specific states, such as price feeds and for application-specific states and history they are not feasible. Moreover, this approach also introduces a certain level of deviation in the data and requires trust in node operators.

Trust Code:

  • Special Variable and Functions: Blockchains like Ethereum have special variables and functions that are mainly used to provide information about the blockchain or are general-use utility functions. It is only possible for a smart contract to access the block hash of 256 most recent blocks. The block hashes are not available for all blocks for scalability reasons. Having access to historical block hashes would be useful as it could allow verification of proofs against them. There is no opcode in EVM execution that allows access to old block contents or previous transaction contents or receipt outputs, so a node can safely forget those things and still be able to process new blocks. This method is also limited to a single blockchain.

Given the challenges and limitations of these solutions, there is a clear need to store and provide block hashes on-chain. This is where storage proofs come in. In order to better understand storage proofs, let’s take a quick look at the data storage in blockchains.

Data storage in a blockchain

A blockchain is a public database that is updated and shared across many computers in a network. Data and state are stored in consecutive groups called blocks and each block cryptographically references its parent by storing the hash of the previous block header.

Let’s take the Ethereum block as an example. Ethereum leverages a particular type of Merkle tree known as the “Merkle Patricia tree” (MPT). Ethereum block headers contain roots of four different Merkle-Patricia tries i.e. State trie, Storage trie, Receipts trie, and Transaction Trie. These 4 tries encode mappings that comprise all Ethereum data. Merkle Trees are used due to their efficiency in data storage. Using recursive hashes, only the root hash eventually needs to be stored, saving a lot of space. They allow anyone to prove the existence of an element in the tree by proving that recursively hashing the nodes leads to the same root hash. Merkle proofs allow light clients on Ethereum to get answers to questions like:

  • Does this transaction exist in a particular block?
  • What is the current balance of my account?
  • Does this account exist?

Instead of downloading every transaction and every block, a “light client” can only download the chain of block headers and verify the information using Merkle Proofs. This makes the overall process highly efficient. Refer to this blog by Vitalik and Maven11 research article to better understand the implementation, advantages, and challenges associated with Merkle Trees.

Storage Proofs

Storage proofs allow us to prove that something is committed in the database and is valid as well using cryptographic commitments. If we can provide such proof, it is a verifiable claim that something happened on the blockchain.

What can storage proofs enable?

Storage proofs permit two main functionalities:

  1. Access historical on-chain data beyond the last 256 blocks, all the way back to the genesis block
  2. Access on-chain data (historical as well as current) of one blockchain on another blockchain with the assistance of consensus verification or L1-L2 bridge in case of L2s

How do storage proofs function?

Storage proofs at a very high-level check if the specific block is part of the blockchain’s canonical history and then verify if the specific data requested is part of the block. This could be achieved via:

  • On-chain processing: dapps could take the initial trusted block, pass the block as calldata to access the previous block, and traverse all the way back to the genesis block. This requires a lot of computation on-chain and a huge amount of call data. This approach is not at all practically feasible due to the huge amount of computation required on-chain. Aragon tried using the on-chain approach in 2018, but it was not feasible due to high on-chain cost.
  • Using ZK proofs: The approach is similar to on-chain processing except for the fact that ZK prover is used to move the complex computation off-chain.
  1. Accessing data on the same chain: ZK proof can be used to assert that an arbitrary historical block header is an ancestor of one of the 256 most recent block headers that are accessible within the execution environment. The other approach is to index the entire history of the source chain and generate a ZK proof of the same to prove that the indexing happened correctly. This proof is updated regularly as new blocks are added to the source chain. Accessing data across chains: The provider collects the block headers of the source chain on the destination chain and attests to the validity of these block headers using ZK consensus proof. It is also possible to use an existing AMP solution like Axelar, Celer, or LayerZero to query the block headers.
  2. A cache of hashes of block headers of the source chain, or the root hash of an off-chain block hash accumulator, is maintained on the destination chain. This cache is updated on a regular basis and is used to efficiently prove on-chain that a given block exists and has cryptographic linkage to a recent block hash accessible from the state. This process is known as proving the continuity of the chain. It is also possible to use a dedicated blockchain to store the block headers of all the source chains.
  3. The historical data/block is accessed from off-chain indexed data or on-chain cache (depending on the complexity of the request) as requested by the dapp on the destination chain. While the cache of a hash of block headers is maintained on-chain, the actual data might be stored off-chain.
  4. The existence of data in the specified block is checked via merkle inclusion proofs and a zk proof for the same is generated. This proof is combined with the zk proof of correct indexing or ZK consensus proof and the proof is made available on-chain for trustless verification.
  5. The dapps can then verify this proof on-chain and use the data to execute the desired action. Along with the verification of the ZK proof, public parameters like the block number and block hash are checked against the cache of block headers maintained on-chain.

Some of the projects adopting this approach are Herodotus, Lagrange, Axiom, Hyper Oracle, Brevis Network, and nil foundation. While significant effort is being made to make applications state-aware across multiple blockchains, IBC (Inter Blockchain Communication) stands out as an interoperability standard that enables applications like ICQ (Interchain queries) and ICA (Interchain accounts). ICQ enables applications on Chain A to query the state of chain B by including the query in a simple IBC packet and ICA allows one blockchain to securely control an account on another blockchain. Combining them can enable interesting cross-chain use cases. RaaS providers like Saga offer these functionalities to all their app chains by default by using IBC.

There are many ways in which storage proofs can be optimized to find the right balance of memory consumption, proving time, verification time, compute efficiency, and developer experience. The overall process can be broadly divided into 3 main sub-processes.

  • Data access
  • Data processing
  • ZK Proof generation for data access and processing

Data access: In this subprocess, the service provider accesses the block headers of the source chain natively on the execution layer or via maintaining an on-chain cache. For data access across chains, verification of the source chain consensus on the destination chain is required. Some of the approaches and optimizations being adopted include:

  • The Existing Ethereum Blockchain: The Ethereum blockchain’s existing structure can be used to prove the value of any historical storage slot with respect to the current blockheader using ZKP. This can be thought of as one large inclusion proof. It is proof that, given a recent block header X at height b, there exists blockheader Y which is an ancestor of X at height b-k. It is based on the security of Ethereum’s consensus and requires a fast-proving system for efficiency. This is the approach used by Lagrange.
  • On-chain Merkle Mountain Ranges (MMR) cache: A Merkle Mountain Range can be viewed as a list of Merkle trees where the individual Merkle trees are combined when two trees reach the same size. The individual Merkle trees in the MMR are combined by adding parent nodes to the trees’ previous roots. MMR is a data structure similar to Merkle trees with some additional benefits, such as efficient appending of elements and efficient data queries, particularly when reading sequential data from large datasets. Appending new headers via Merkle tree would require passing all the sister nodes at each level. In order to append data efficiently, Axiom uses MMR to maintain a cache of the hash of block headers on-chain. Herodotus stores the root hash of the MMR block hash accumulator on-chain. This allows them to check the fetched data against these block header hashes via inclusion proofs. This approach requires the cache to be updated on a regular basis and brings in liveness concerns if not decentralized.
  • Herodotus maintains two different MMR. Depending on the specific blockchain or layer, accumulators can be tailored to utilize different hashing functions, optimizing efficiency and computational costs. For proving on Starknet, poseidon hash might be used but Keccack hash might be used for EVM chains.
  • Off-chain MMR cache: Herodotus maintains an off-chain cache of previously fetched queries and results to allow for faster fetching in case the data is requested again. This requires additional infrastructure than simply running an archive node. The optimizations done on off-chain infrastructure can potentially decrease costs for the end user.
  • Dedicated blockchain for storage: Brevis relies on a dedicated ZK rollup (aggregation layer) to store all the block headers of all the chains they attest to. Without this aggregation layer, each chain would need to store the block headers for every other chain, resulting in O(N2) “connections” for N blockchains. By introducing an aggregation layer, each blockchain only needs to store the state root for the rollup, reducing the overall connections to O(N). This layer is also used to aggregate multiple proofs for block headers/query results and a single proof for verification on each connected blockchain can be submitted.
  • L1-L2 message passing: Verification of source chain consensus can be avoided in the case of L2s because L2s support native messaging for updating L2 contracts on L1. The cache could be updated on Ethereum and L1-L2 message passing can be used to send the block hash or root of the tree compiled off-chain to other L2s. Herodotus is adopting this approach but this is not feasible for alt L1s.

Data processing:

Along with access to data, smart contracts should also be able to do arbitrary computations on top of data. While some use cases may not require computation, it is an important value-added service for a lot of other use cases. Many of the service providers enable computations on the data as a zk proof of the computation can be generated and provided on-chain for validity. Because existing AMP solutions like Axelar, LayerZero, Polyhedra Network could potentially be used for data access, data processing could become a differentiator for storage proof service providers.

Hyper Oracle, for instance, allows developers to define custom off-chain computations with JavaScript. Brevis has designed an open marketplace of ZK Query Engines that accepts data queries from dApps, and processes them using the attested block headers. The smart contract sends a data query, which is picked up by a prover from the marketplace. The Prover generates a proof based on the query input, relevant block headers (from the Brevis aggregation layer), and results. Lagrange has introduced ZK Big Data Stack to prove distributed programming models like SQL, MapReduce, and Spark/RDD. The proofs are modular and can be generated from any block header originating from existing cross-chain bridges and AMP protocols. ZK MapReduce, the first product in the Lagrange ZK BigData stack, is a distributed computation engine (based on the well-known MapReduce programming model) for proving results of computation involving sizable sets of multi-chain data. For example, a single ZKMR proof can be used to prove changes in the liquidity of a DEX deployed on 4–5 chains over a specified time window. For relatively simple queries, the computation can also be done directly on-chain as being done by Herodotus at the moment.

Proof generation:

  • Updatable proofs: Updatable proofs can be used when a proof needs to be computed and efficiently maintained over a moving stream of blocks. When a dapp wishes to maintain a proof for a moving average for a contract variable (such as token price), as new blocks are being created, without recomputing the new proof from scratch, existing proofs can be updated efficiently. To prove dynamic data-parallel computation on an on-chain state, Lagrange builds a batch vector commitment, called a Recproof, on top of a portion of MPT, updates it on the fly, and dynamically computes over it. By recursively creating a Verkle tree on top of MPT, Lagrange is able to compute large amounts of dynamic on-chain state data efficiently.
  • Verkle Trees: Unlike Merkle trees, where we need to provide all the nodes that share a parent, Verkle Trees require only the path to the root. This path is much smaller compared to all the sister nodes in the case of the Merkle tree. Ethereum is also exploring the use of Verkle trees in future releases to minimize the amount of state that Ethereum full nodes are required to hold. Brevis leverages Verkle Tree to store attested block headers and query results in the aggregation layer. It significantly reduces the data inclusion proof size, especially when the tree contains a large number of elements, and also supports efficient inclusion proof for a batch of data.
  • Mempool monitoring for faster proof generation: Herodotus recently announced turbo, which allows developers to add a few lines of code to their smart contract code to specify the data query. Herodotus monitors the mempool for smart contract transactions that interact with the turbo contract. The proof generation process begins when the transaction is in the mempool itself. Once the proof is generated and verified on-chain, the results are written into the on-chain turbo swap contract. Results can only be written to the turbo swap contract once they are authenticated by storage proofs. Once this happens, a portion of the transaction fees is shared with the sequencer or block builder, incentivizing them to wait a little longer to collect the fees. For simple data queries, it is possible that the requested data is made available on-chain before the transaction from the user is included in the block.

Application of state/storage proofs

State and storage proofs can unlock many new use cases for smart contracts at application, middleware and infrastructure layer. Some of these are:

Application Layer:

Governance:

  • Cross-chain Voting: An on-chain voting protocol could allow users on Chain B to prove ownership of assets on Chain A. Users will not have to bridge their assets to gain voting power on a new chain. Example: SnapshotX on Herodotus
  • Governance token distribution: Applications could distribute more governance tokens to active users or early adopters. Example: RetroPGF on Lagrange

Identity and Reputation:

  • Proof of ownership: A user can provide proof of ownership of a certain NFT, SBT, or assets on chain A, enabling them to perform certain actions on Chain B. For example, a gaming app-chain may decide to launch its NFT collection on another chain with existing liquidity like Ethereum or any L2. This will allow the game to tap into liquidity that exists elsewhere and bridge the NFT utility without actually requiring NFTs to be bridged.
  • Proof of usage: Users could be awarded discounts or premium features based on their historical usage of the platform (prove that the user-traded X volume on Uniswap)
  • Proof of OG: A user can prove that he/she owns an active account that is more than X days old
  • On-chain credit score: A multichain credit score platform can aggregate data from multiple accounts of a single user to generate a credit score

All the above proofs can be used to provide a customized experience to users. Dapps could offer discounts or privileges to retain experienced traders or users and offer a simplified user experience for novice users.

Defi:

  • Cross-chain lending: Users could lock in assets on Chain A and take out a loan on Chain B instead of bridging the tokens
  • On-chain insurance: Failures can be determined by accessing historical on-chain data and insurance can be settled fully on-chain.
  • TWAP of the asset price in a pool: An application could calculate and fetch the average price of an asset in an AMM pool over a specified period of time. Example: Uniswap TWAP Oracle with Axiom
  • Option pricing: an on-chain options protocol may price an option using the volatility of an asset over the past n blocks on a decentralized exchange.

The last two use cases will require the proof to be updated every time a new block is added to the source chain.

Middleware:

  • Intents: Storage proofs will allow users to be more articulate and clear with their intents. While it is the solvers' job to execute the necessary steps to fulfill the intent of the user, a user could more clearly specify the conditions based on on-chain data and parameters. Solvers can also prove the validity of on-chain data leveraged to find the optimal solution.
  • Account Abstraction: Users could rely on data coming from other chains using storage proofs to set rules via Account Abstraction. Example: Every wallet has a nonce. We can prove that one year ago, the nonce was a particular number, and currently the nonce is the same. This can be used to prove that this wallet has not been used at all and the access to the wallet can then be delegated to another wallet.
  • On-chain Automation: Smart contracts could automate certain actions based on predefined conditions that depend on on-chain data. Automated programs are required to call smart contracts at certain intervals to maintain the AMM’s optimal price flow or to keep lending protocols healthy by avoiding bad debt. Hyper Oracle enables automation along with access to on-chain data.

Infrastructure

  • Trustless on-chain Oracle: Decentralized oracle networks aggregate responses from numerous individual oracle nodes within an oracle network. Oracle Networks can eliminate this redundancy and leverage cryptographic security for on-chain data. Oracle network could ingest data from multiple chains (L1s, L2s, and alt L1s) onto a single chain and simply prove existence using storage proofs elsewhere. DeFi solutions with significant traction could work on a custom solution as well. For example, Lido Finance, the largest liquidity staking provider, has teamed up with Nil Foundation, to fund the development of zkOracle. The solutions will enable trustless data access to in-EVM historical data and secure $15B in Lido Finance staked Ethereum liquidity.
  • AMP Protocols: Existing AMP solutions could increase the expressivity of their messages by partnering with storage proof service providers. This is an approach suggested by Lagrange in their Modular Thesis article.

Conclusion

Awareness empowers technology companies to better serve their customers. From user identity to purchasing behavior to social graphs, technology companies leverage awareness to unlock capabilities such as precision targeting, customer segmentation, and viral marketing. Traditional tech companies need explicit permission from their users and have to be careful while managing user data. However, all the user data on permissionless blockchains is publicly available without necessarily revealing user identity. Smart contracts should be able to leverage the publicly available data to better serve users. Development and adoption of more specialized ecosystems will make state awareness across time and blockchains an increasingly important problem to be solved. Storage proofs can enable Ethereum to emerge as an identity and asset ownership layer along with being a settlement layer. Users could maintain their identity and key assets on Ethereum which could be used across multiple blockchains without bridging assets all the time. We continue to remain excited about the new possibilities and use cases that will be unlocked in the future.

Sources:

--

--

LongHash Ventures

We specialize in bootstrapping Web3 ecosystems. We invest in Web3 protocols and our LongHashX arm partners with ecosystems to accelerate Web3 founders.