Why Blockchain Data Infrastructure Won’t Converge on a Single Stack

Blockchain data infrastructure will not converge on a single stack. Different workloads need different slices of the same chain data, from subgraphs and APIs to streams, warehouses, and custom indexers.

Why Blockchain Data Infrastructure Won’t Converge on a Single Stack

The hard problem in blockchain data was never volume. There's more of it every year, and faster chains keep adding to the pile, but raw growth is the boring half of the story. The half that shapes how you build is less spoken: everyone reads from the same chains, and almost no one wants the same slice.

A trading firm, a stablecoin issuer, a DeFi protocol, a wallet, a game studio, a risk desk all pull from the same blocks and ask completely different questions of them. None wants the same data model, latency, or degree of control.

So blockchain data won't converge on one architecture. It fragments the way traditional infra did: most companies rent from the cloud, ones with the technical chops operate on-prem, a handful pay for a colo because a few milliseconds of latency, or a regulator, justifies the cost. No setup is universally right; the right one depends on the workload.

On-chain data is heading the same way. A team might build a custom indexer, consume raw streams, lean on a generic API, or warehouse everything for analytics, and most serious products do a few of these at once. Subgraphs survive all of it because they answer a problem nearly every on-chain app runs into: contracts emit activity, and the app needs that activity as queryable state.

More data makes indexing harder, not easier

Indexing everything is already expensive. A full archival Ethereum node currently runs up to 18+ terabytes (geth), and an explorer-grade analytics database can need several times that once you add decoded events, token transfers, traces, labels, and the derived tables on top. Faster chains and heavier usage push that number up. The bill scales with the chain's speed, the number of contracts running on it, and the apps that each want their own cut of what those contracts produce.

Most applications don't need the whole chain, just the slice that matters to their product.

None of those are data out of the box. They're application-specific data models, and someone has to build them.

Raw chain data isn't the application state

Chains are built for execution, consensus, and verification, not for the way an app wants to read. A block holds transactions; transactions throw off logs, receipts, calls, and state changes. That data is complete and nearly useless on its own.

What an app wants to know is more practical:

  • this user's current position,
  • the depth of a pool,
  • which accounts are about to get liquidated, and
  • changes after the last contract upgrade.

Every one of those answers takes interpretation. Someone has to decide which contracts and events matter, how those events mutate state, and how the result gets queried later. That work is indexing, and for most smart-contract apps a subgraph is the most practical way to do it.

What subgraphs actually provide

A subgraph contains what the app cares about: the contracts to watch, the events to handle, how each event updates the model, and how the result gets served over GraphQL. For a DEX that's pools, swaps, liquidity positions, ticks, and hourly volume. A lending market needs collateral, borrows, liquidations, and positions; a stablecoin, mints, burns, holders, and supply.

The point was never to index everything. It's to define exactly what to index, then keep redefining it as the app moves: a new contract ships, a market goes live, an old one gets upgraded, you add another chain, a dashboard needs one more view. You edit the manifest, add the contracts, adjust the handlers, extend the schema, and reindex.

It isn't free. A schema change can force a full re-index; plenty of contracts don't emit enough to reconstruct what you need, and bad mapping logic produces garbage out the other end. But set that against building ingestion, decoding, reorg handling, backfills, storage, monitoring, and query serving yourself, and the subgraph is the lighter road by a wide margin.

Why streams and APIs aren't enough

Streams are great when you want raw data piped into your own pipeline: blocks, logs, traces, and decoded events, delivered as they land. Moving bytes isn't the same as maintaining state, though. You still have to transform, store, reconcile, handle reorgs, backfill, and expose queries, then keep everything downstream consistent.

Generic APIs solve a narrower problem: the common reads, like balances, transfers, token metadata, prices, NFTs, and wallet history. They're fast, and they work right up until you need protocol-specific state or business logic that moves with your contracts. Then the fixed schema behind the endpoint fails. Custom indexing hands you total control and the entire pipeline to own. Subgraphs sit in between, more flexible than an API and far lighter than rolling your own indexer. That middle is where subgraphs are.

Custom indexing will grow, but it won’t replace subgraphs

The sophisticated shops will keep building more in house. Trading firms, data vendors, and large institutions want tight control over latency, models, execution paths, and reconciliation, and for them custom indexing earns its keep. But apps that need a custom data view will always outnumber the teams that can afford to build and run full indexing infrastructure. Most onchain teams don't want to burn three months on pipelines before shipping anything. They want to name the contracts and events that matter, define the data shape they want, then query it and trust the answer. That's the job subgraphs keep.

The blockchain data stack will stay layered

No single tool wins, because the workloads are too different. RPC stays around for direct chain access, transaction submission, and debugging. Streams move real-time data into private pipelines, and generic APIs serve the common read patterns. Warehouses carry analytics and historical research. Custom indexers go to the teams with the budget to own everything, while subgraphs handle application-specific indexing and queryable contract state. A serious product reaches for several at once, the same way a serious backend runs Postgres next to a cache, a queue, and a warehouse without anyone calling it indecision.

Adoption won't make any of this simpler. It makes the data more varied and more specialized, and it pushes that data deeper into production, where being wrong costs real money. Teams will pick different infrastructure based on what they're building and what they can afford to run. Underneath all of it, turn contract activity into a structured, queryable state. Subgraphs answer that in a way most teams can actually live with, which is why they'll still be here when the rest of the stack has shuffled around them.

About Ormi

Ormi is the next-generation data layer for Web3, purpose-built for real-time, high-throughput applications like DeFi, gaming, wallets, and on-chain infrastructure. Its hybrid architecture ensures sub-30ms latency and up to 4,000 RPS for live subgraph indexing.

With 99.9% uptime and deployments across ecosystems representing $50B+ in TVL and $100B+ in annual transaction volume, Ormi is trusted to power the most demanding production environments without throttling or delay.