4 Subgraph best practices to speed up indexing and queries

Slow subgraphs are usually caused by database growth, inefficient entity relationships, or excessive RPC calls. These four best practices help improve indexing speed and query performance on subgraphs.

4 Subgraph best practices to speed up indexing and queries

Slow subgraphs are usually caused by database growth, inefficient entity relationships, or excessive RPC calls. These four best practices help improve indexing speed and query performance on The Graph.

If your subgraph is indexing slowly or GraphQL queries are lagging, the problem is usually schema design or unnecessary database growth inside Graph Node.

A subgraph that indexes slowly or returns queries with high latency usually points to one of a few design decisions in the schema, mapping logic, or indexing pipeline. Over time, these issues increase the amount of data Graph Node needs to read and write, which slows down indexing and degrades query performance.

The good news is that most performance issues can be fixed without rewriting your subgraph.

In most cases, the improvements require only small schema or configuration changes, but the impact can be significant.

TL;DR: How to speed up a subgraph

If your subgraph is slow to index or queries are lagging, start with these four optimizations:

  1. Enable database pruning removes older historical entity versions while keeping the latest state needed by most applications.
  2. Replace large arrays with @derivedFrom relationships
  3. Use Bytes IDs and immutable entities for faster indexing
  4. Avoid eth_call RPC requests whenever possible

Together, these changes reduce database size, improve indexing throughput, and significantly decrease query latency.

What is a subgraph?

A subgraph is an indexing standard created by The Graph protocol that transforms blockchain events into structured data and exposes it through a GraphQL API.

Subgraphs allow applications to query blockchain data efficiently without scanning raw chain history. Instead of repeatedly calling RPC endpoints, developers define how contract events should be indexed and stored in a database.

For example, subgraphs are commonly used to index and query on-chain data for analytics, DeFi monitoring, or tracking real-time stablecoin activityA bloated.

Under the hood, the process works like this:

  1. Graph Node ingests blockchain data from RPC nodes
  2. Mapping logic processes events and transforms them into entities
  3. Entities are stored in a PostgreSQL database
  4. Applications query that data through GraphQL

This architecture works well, but it introduces a tradeoff: the performance is tied to both the indexing workload and the underlying database.

Learn more: How to choose the best subgraph providers

Why do subgraphs become slow?

Most subgraph performance issues come down to four patterns:

  • A bloated database caused by storing unnecessary historical data
  • Duplicate entity storage from poorly structured relationships
  • Expensive ID formats that slow indexing and querying
  • RPC calls (eth_call) that slows down indexing

When these issues compound, developers start to see:

  • indexing lagging behind the chain head
  • slow GraphQL queries
  • large Graph Node databases
  • increased indexing costs

What subgraph optimization improves

Every subgraph ultimately runs on a PostgreSQL database.

Graph Node stores entities in PostgreSQL and exposes them through a GraphQL query layer. As the number of stored entities grows, query performance becomes increasingly dependent on the database structure.

So when optimizing a subgraph, you're really optimizing two things:

  1. Database size and structure
  2. Indexing pipeline latency

The four practices improve one or both of these layers.

Tip #1. Prune your subgraph database with indexer hints

Graph Node stores versioned entity history so queries can retrieve data at past block heights.

Pruning lets you automatically trim historical data from your subgraph's database so it doesn't accumulate entity records that are not being queried.

Many older subgraphs were created before pruning existed. As a result, they often accumulate far more historical data than most applications actually need.

Every time an entity is updated:

  • the new version is written to the database
  • the previous version is retained
  • each version is associated with a block range indicating when it was valid

Over time, this creates a large number of historical rows in the database. Even if your application never queries those historical versions, Graph Node still has to manage and store them.

Pruning allows you to limit how much of that historical entity history is retained.

The result is a smaller database, which improves both:

  • query performance
  • indexing efficiency

When pruning is essential

For most production subgraphs that do not rely on historical queries, pruning can significantly reduce database size while maintaining the latest state needed for applications.

Tip #2. Use @derivedFrom to manage one-to-many relationships

Arrays in subgraphs can drive up your database size if you're not managing them correctly. The classic problem looks like this: you have an Post entity with a comments field that stores an array of comment IDs. Every time a comment is created, it gets stored twice. Once in the Comment entity itself, and again in the array on the Post. At scale, with millions of entities, that double storage adds serious bloat.

The fix is @derivedFrom. Instead of storing the array directly on the parent entity, you define the relationship virtually. The Comment entity holds a reference back to its Post, and the comments field on Post is derived from that reference rather than stored independently.

The practical result is that each piece of data only lives in one place. You can still query a post and get all its comments. You can still do reverse lookups from a comment back to its post. And with derived field loaders, you can access and manipulate that virtual relationship directly inside your mappings. You get all the relational power with none of the duplication.

For a deeper look at what runaway arrays can do to a subgraph, there is a solid blog post by Kevin Jones on the Graph blog that covers this in detail.

Tip #3. Use Bytes as IDs and mark entities as immutable

This combination is probably the biggest performance unlock of the four. In many benchmarks, Bytes IDs significantly improve indexing and query performance compared to string IDs.

Bytes IDs are faster because they avoid expensive string comparisons and produce smaller, more efficient database indexes. There is less computation involved in processing a Bytes value than in parsing and comparing a string. To create a bytes-based ID, you can use the transaction hash directly, or use concat and concatI32 from graph-ts to combine pieces of data into a unique bytes value. The latter gives you a tighter, more collision-resistant ID.

One tradeoff worth knowing: if you sort query results by a bytes ID, the ordering will look non-sequential. It is not alphabetical or numerical in any intuitive sense. If you need ordered results, add a separate BigInt counter field to your entity and sort by that instead.

Immutable entities tell Graph Node that this entity will never be updated after it is created. That guarantee allows the node to handle storage and retrieval much more efficiently. A good candidate for an immutable entity is one that simply logs raw event data from the chain without any transformation. Immutable entities guarantee that each entity is written only once and never updated, allowing Graph Node to store them more efficiently.

Note: The main consideration is that once you commit to immutability, you cannot change that entity in later versions.

Tip #4. Avoid eth_calls wherever possible

Each eth_call requires an RPC request during indexing. While infrastructure providers optimize these calls, excessive contract calls can still slow indexing significantly when repeated across many blocks.

RPC calls occur when a mapping invokes contract functions, for example: contract.totalSupply().

The cleanest solution is to avoid them entirely. If you control the smart contract being indexed, emit the data you need as events rather than requiring a call to fetch it. Event data is already part of the block and costs nothing extra to process.

If you do not control the contract and the eth_calls are unavoidable, there is a newer approach: declaring eth_calls in the subgraph manifest.

When calls are declared in the manifest, Graph Node can batch and cache these calls, allowing handlers to access results without issuing separate RPC requests. Handlers retrieve from that cache instead of making live RPC calls each time. It does not eliminate the calls, but it makes them dramatically less expensive in practice.

Putting this together

These optimizations may look small, but across they compound quickly. A smaller database, fewer RPC calls, and better entity modeling can dramatically improve indexing speed and query performance.

The importance of reliable indexing becomes clear in real-world applications. Systems that depend on fresh blockchain data, from analytics dashboards to trading infrastructure, can break when indexing falls behind. In one case study, a prediction market eliminated daily missing transactions after upgrading its indexing infrastructure.

About Ormi

Ormi is the next-generation data layer for Web3, purpose-built for real-time, high-throughput applications like DeFi, gaming, wallets, and on-chain infrastructure. Its hybrid architecture ensures sub-30ms latency and up to 4,000 RPS for live subgraph indexing.

With 99.9% uptime and deployments across ecosystems representing $50B+ in TVL and $100B+ in annual transaction volume, Ormi is trusted to power the most demanding production environments without throttling or delay.