
What Is a Blockchain Indexer?
Written by Usman Asim

Blockchains have a fundamental problem: their data is not searchable. In other words, onchain data cannot be queried by default.
In a traditional database, data is organized in tables with indexes and relationships, letting developers instantly query their request without having to scan every record. In contrast, blockchains store data as a linear chain of blocks, optimized for immutability and security, not fast searches.
This design means there's no SQL, no built in indexes, no convenient "SELECT * FROM transactions WHERE..."functions that make it easy to query data. What blockchains provide instead are low level RPC methods like eth_getBlockByNumber that return raw blocks, forcing you to fetch and scan them one by one to find what you need.
For example, if someone wanted to find all transactions from a specific wallet, they’d need to start from block zero, loop through millions of blocks, check every transaction in each block, and hope the node doesn't rate limit halfway through. On Ethereum mainnet with over 20 million blocks, this could take hours or even days, and you'd still need to organize and store that data yourself to make it useful.
That's where indexers come in, acting as a bridge between the blockchain's raw, sequential data and the fast queries that your application needs. Think of them as the search engine for onchain data: they continuously monitor the blockchain, extract relevant information, organize it into queryable databases, and serve it up through APIs that respond in milliseconds instead of hours.
Without indexers, building responsive apps would be nearly impossible. Imagine a DeFi dashboard that takes 30 seconds to load your portfolio, or a neobank that doesn’t allow you to filter your transactions based on transaction types. Indexers solve this by pre-processing blockchain data so you don't have to scan every block manually.
In this guide, we'll break down what a blockchain indexer is, how they work, and share some real world examples of indexers in practice.
We'll keep it practical with code examples and link out to resources so you can dive deeper, so whether you're a junior dev building your first app or just refreshing your knowledge, this guide will get you up to speed on one of the most critical pieces of blockchain infrastructure.
What is a blockchain indexer?
A blockchain indexer is a specialized service that continuously watches the blockchain, extracts transaction data and smart contract events, transforms it into a structured format, and stores it in a database optimized for fast queries.
Think of it as a three-step process:
Extract: The indexer monitors blockchain nodes in real-time, capturing every new block, transaction, and event as they're added to the chain.
Transform: It decodes the raw blockchain data, parsing transaction inputs, decoding smart contract events, tracking token transfers, and organizing state changes into meaningful records.
Load: Finally, it can store this processed data in a queryable database (like PostgreSQL, MongoDB, or specialized graph databases), allowing that data to be exposed through APIs that applications can use.
For a solid deep dive into how this process works in depth, check out the Ethereum Foundation’s intro on indexers.
What are the components of an indexer?
While indexers vary in implementation, most share a common architecture built around a few core components that work together to process and serve blockchain data. Here's how they fit together:
1. The Data Source (Blockchain Connection)
This is the indexer's connection to the blockchain itself, typically a node (like Geth for Ethereum, or a Solana RPC node) or an infrastructure provider's API (such as Alchemy).
The indexer continuously pulls raw data from this source: new blocks as they're added, transactions within those blocks, event logs emitted by smart contracts, and sometimes state changes. Some indexers process data in real-time (listening for new blocks immediately), while others work in batches to handle historical data or catch up after downtime.
2. Indexing Engine (The Processing Layer)
This is the brain of the operation. The indexing engine takes raw blockchain data and transforms it into something meaningful and searchable.
At its core, the engine's job is decoding transactions and events. Raw blockchain data is encoded, making transaction inputs are hex strings and event logs are cryptographic hashes. The indexing engine uses contract ABIs (Application Binary Interfaces) to interpret what each transaction actually did: was it a token swap? An NFT mint? A governance vote? It decodes the parameters, extracts the meaningful values, and translates everything into human readable records.
Beyond just decoding individual transactions, the engine must also track state changes over time. Blockchains don't store current state in an easily accessible way; instead they store a history of state transitions. So the indexer reconstructs current state by following the chain of events: tracking how token balances change with each transfer, monitoring NFT ownership as tokens move between wallets, and watching smart contract storage variables evolve with each interaction. This state tracking is crucial for queries like "what NFTs does this wallet currently own?" The answer isn't stored anywhere on-chain, it has to be computed from the complete transfer history.
The engine also builds specialized indexes, which are efficient data structures that enable fast lookups. Think of it like a book's index: instead of reading every page to find mentions of "Ethereum," you check the index and jump straight to the relevant pages. The indexing engine creates lookup tables for addresses (find all activity for wallet 0x123), token IDs (find the owner and history of NFT #5000), transaction types (find all Uniswap swaps), timestamps (find all activity in the past 24 hours), and more. These indexes are what turn a sequential scan of millions of blocks into a sub-second query.
Another critical responsibility for this engine is handling chain reorganizations. Occasionally, blockchain consensus results in a short section of recent blocks being replaced with an alternative set of blocks. This is often called a "blockchain reorg.” When this happens, the indexer must detect the reorg, roll back any data it indexed from the orphaned blocks, and re-index the new canonical blocks. Without proper reorg handling, the indexed data would contain transactions that never actually happened on the canonical chain.
Finally, this indexing engine engine manages syncing and backfills. When an indexer first starts, it needs to process the entire blockchain history, potentially millions of blocks dating back years. This "backfill" process must be efficient, often parallelizing block processing and checkpointing progress to handle restarts. Once caught up, the indexer maintains continuous sync with new blocks as they're added, typically staying just a few seconds behind the chain tip. If the indexer goes offline or falls behind, it must catch up without missing any blocks.
This is where the heavy computational work happens: parsing millions of transactions, filtering relevant events based on contract addresses and topics, decoding complex nested data structures, maintaining consistent state across reorgs, and structuring everything for fast storage and retrieval.
3. Database (Storage Layer)
Once the data has been processed and structured by the engine, it needs to live somewhere queryable, typically an external database. The database choice depends on the indexer's use case and query patterns:
Relational databases (PostgreSQL, MySQL): Relational databases are the most common choice for most blockchain indexers. They are ideal for structured data with complex relationships, like tracking wallet balances that change with each transaction, maintaining transaction histories with foreign keys linking to blocks and addresses, or querying token transfers with JOIN operations across multiple tables. SQL's powerful query language makes it easy to ask questions like "show me all addresses that received more than 10 ETH from this contract in the past week." The rigid schema ensures data consistency, which is critical when tracking financial information.
NoSQL databases (MongoDB, Cassandra): NoSQL databases offer flexibility for semi-structured data where the schema might evolve over time: useful when indexing diverse smart contracts with varying event structures or storing raw transaction metadata that doesn't fit neatly into tables. These databases excel at horizontal scaling, distributing data across multiple servers to handle massive write volumes (important when processing thousands of blocks per second). They're often used when raw indexing speed is more critical than complex querying capabilities.
Graph databases (Neo4j): Graph databases are purpose built for relationship heavy queries. Perfect for use cases like tracking token flows through multiple wallets (follow the money), analyzing DeFi protocol interactions (which protocols are connected through liquidity pools), or building social graphs (which wallets interact with each other). Instead of JOINs with relational databases, graph databases use native graph traversal, making "find all wallets within 3 hops of this address" orders of magnitude faster than in relational databases.
Data warehouses (BigQuery, Snowflake): Data warehouses are designed for analytics and aggregations across massive datasets. These aren't for real-time queries: they're for answering questions like "what's the total trading volume across all DEXs this month" or "show me daily active addresses by chain for the past year." They can crunch through billions of records efficiently using columnar storage and distributed processing, but with higher latency than operational databases.
Many production indexers use multiple database types in tandem: storing transactional data in PostgreSQL for fast, real-time queries that power application UIs, while simultaneously feeding the same data into BigQuery for analytics dashboards and historical trend analysis. This hybrid approach lets each database do what it does best.
4. API Layer (Query Interface)
The API layer is how applications are able to access the indexed data. The API exposes endpoints that let apps query the processed blockchain data without knowing how it's stored or organized underneath, abstracting away the complexity of the database schema, indexing logic, and data transformations.
Common approaches include:
GraphQL APIs: GraphQL APIs are the most flexible option, letting clients request exactly the data they need in a single query. Instead of making multiple REST calls, an application can ask for nested, related data in one request: like "get all ERC-20 transfers for this address where value > $1,000, and for each transfer include the token's name, symbol, decimals, and current price." GraphQL lets the client specify which fields to return, avoiding over-fetching (getting data you don't need) or under-fetching (requiring multiple round trips). This is particularly powerful for complex queries across multiple entities. The Graph protocol, for example, is built entirely on GraphQL.
REST APIs: Rest APIs are simpler and more predictable, with predefined endpoints for common queries. Each endpoint has a specific purpose like
/api/address/{address}/transactionsto get transaction history, or/api/token/{contract}/holdersto get all current token holders. REST is easier to cache (since each URL represents a specific resource), simpler to document, and familiar to most developers. The trade-off is less flexibility, where if you need data the endpoint doesn't provide, you'll need multiple requests or wait for a new endpoint to be built. REST is ideal when query patterns are well-known and consistent.WebSocket streams: Websocket streams are perfect for real-time updates as new blocks are indexed. Instead of polling the API every few seconds asking "anything new?", your app opens a WebSocket connection and receives push notifications the moment relevant data arrives like when a specific address receives a transaction. This is critical for applications that need instant updates, like live trading dashboards and real-time notification systems. WebSockets maintain an open connection, so they're more resource-intensive than occasional REST calls but eliminate latency for time-sensitive data.
The API layer often includes crucial infrastructure features beyond just serving data: caching (storing frequently-requested query results in memory to avoid hitting the database repeatedly, dramatically improving response times for popular queries), rate limiting (preventing any single user from overwhelming the system with too many requests, ensuring fair access for everyone), and authentication (API keys or tokens to track usage, enforce access controls, and potentially charge for premium tiers). These features ensure the API remains fast, reliable, and economically sustainable, especially important when serving thousands of apps simultaneously.
How The Indexer Components Work Together
Here's a example flow in action. An indexer's data source pulls block #18,500,000 from an Ethereum node. The indexing engine then decodes 200 transactions in that block, extracts 500 events (including Uniswap swaps and NFT transfers), and identifies which wallets were affected.
After that, the database stores these records with indexes on addresses, token contracts, and timestamps. From there, your app queries the indexer’s API asking "show me all NFT purchases by address 0x123 this week." The API returns results in 50ms by querying the indexed database, not the blockchain.
This architecture is what enables indexers to turn hours of blockchain scanning into milliseconds of query time.
How Does Indexing Work in Practice?
Now that we understand the components, let's walk through how indexing actually works in practice and show how an indexer turns raw blockchain data into instantly queryable information.
A Real Example: Indexing a DeFi Lending Protocol
Let's look at a concrete example with a simplified lending protocol smart contract, similar to how platforms like Aave or Compound work. This contract lets users deposit collateral and borrow assets:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
`contract LendingProtocol {
`struct Position {
address user;
address collateralToken;
uint256 collateralAmount;
address borrowedToken;
uint256 borrowedAmount;
uint256 interestRate;
uint256 timestamp;
}
mapping(uint256 => Position) public positions;
uint256 public nextPositionId;
event PositionOpened(
uint256 indexed positionId,
address indexed user,
address collateralToken,
uint256 collateralAmount,
address borrowedToken,
uint256 borrowedAmount,
uint256 interestRate
);
event PositionClosed(
uint256 indexed positionId,
address indexed user,
uint256 amountRepaid
);
event PositionLiquidated(
uint256 indexed positionId,
address indexed liquidator,
uint256 collateralSeized
);
function openPosition(
address collateralToken,
uint256 collateralAmount,
address borrowedToken,
uint256 borrowedAmount,
uint256 interestRate
) external {
uint256 positionId = nextPositionId++;
positions[positionId] = Position({
user: msg.sender,
collateralToken: collateralToken,
collateralAmount: collateralAmount,
borrowedToken: borrowedToken,
borrowedAmount: borrowedAmount,
interestRate: interestRate,
timestamp: block.timestamp
});
emit PositionOpened(
positionId,
msg.sender,
collateralToken,
collateralAmount,
borrowedToken,
borrowedAmount,
interestRate
);
}
function closePosition(uint256 positionId, uint256 amountRepaid) external {
require(positions[positionId].user == msg.sender, "Not position owner");
emit PositionClosed(positionId, msg.sender, amountRepaid);
delete positions[positionId];
}
}Without an indexer, answering questions about this protocol would be painful:
"What's the total value locked across all positions?" This would require scanning every block, find every
PositionOpenedevent, decode each one, and calculate total collateral."Show me all positions for user 0x123" This would also require a full full scan, forcing filtering by user address.
"What's the average interest rate for ETH-collateralized loans?" Another full scan would be required here, filtering by collateral token, in order to aggregate interest rates.
"How many positions were liquidated this week?" This would require scanning a week's worth of blocks looking for
PositionLiquidatedevents.
Each of these queries could take minutes or hours, requiring you to process gigabytes of blockchain data.
With an indexer, here's what happens:
Event detection: The indexer monitors the LendingProtocol contract address. When block #18,500,000 includes a transaction that emits a
PositionOpenedevent, the indexer immediately captures it.Data extraction: Using the contract's ABI, the indexer decodes the event parameters:
positionId=42,user=0xabc...,collateralToken=0xWETH,collateralAmount=5000000000000000000(5 ETH in wei),borrowedToken=0xUSDC,borrowedAmount=8000000000(8,000 USDC),interestRate=500(5%).Enrichment: The indexer can enhance this data by fetching additional context, like looking up the current USD price of ETH and USDC to calculate the position's value in dollars, or storing the block timestamp for time-based queries.
Storage: It writes this to the database with multiple indexes:
INSERT INTO lending_positions (
position_id, user_address, collateral_token, collateral_amount,
borrowed_token, borrowed_amount, interest_rate,
block_number, timestamp, status
) VALUES (42, '0xabc...', '0xWETH', 5000000000000000000, '0xUSDC', 8000000000, 500, 18500000, 1699564800, 'open');
-- Create indexes for fast lookups
CREATE INDEX idx_user ON lending_positions(user_address);
CREATE INDEX idx_collateral_token ON lending_positions(collateral_token);
CREATE INDEX idx_status ON lending_positions(status);5. API serving: Now those complex queries become simple, fast database lookups:
"Total value locked?" →
SELECT SUM(collateral_amount * token_price) FROM lending_positions WHERE status='open'(returns in 10ms)"User 0x123's positions?" →
SELECT * FROM lending_positions WHERE user_address='0x123'(instant)"Average interest rate for ETH loans?" →
SELECT AVG(interest_rate) FROM lending_positions WHERE collateral_token='0xWETH'(milliseconds)
The indexer continuously repeats this process for every new block, maintaining a real-time, queryable view of the protocol's complete state and history. When a PositionClosed event fires, it updates the status field. When prices change, it can recalculate position health ratios for liquidation monitoring.
This transformation, from sequential blockchain scanning to indexed database queries, is what makes modern crypto fintech dashboards, analytics platforms, and risk monitoring tools possible. Without indexers, the user experience we expect from blockchain applications simply wouldn't exist.
What Problems Do Indexers Solve?
Now that you've seen how indexing works with our lending protocol example, let's summarize the fundamental problems indexers solve for developers:
Data access and query performance: Blockchains have no built-in search functions and require scanning millions of blocks sequentially to query data. Indexers extract and organize blockchain data into queryable databases with strategic indexes, turning hour-long scans into millisecond queries.
Data analytics: Understanding activity at scale (trading volumes, user patterns, protocol health) requires aggregating massive datasets. Indexers maintain historical state and pre-compute common metrics. Daily DEX volume is already summed; inactive wallets after protocol changes can be queried via indexed timestamps instantly, eliminating the need for custom data pipelines.
Real time app development: Modern apps must react to onchain events instantly, in order to provide accurate and performant experiences to users. Constantly polling blockchain nodes is slow and inefficient. Indexers use push-based architectures (WebSockets) that notify applications the moment events occur, making blockchain apps feel as responsive as web2.
Common Indexing Use Cases
Understanding why indexers exist helps clarify what you can actually build with them. Here are real-world applications that leverage indexed blockchain data:
DeFi dashboards and portfolio management: Apps like Zapper, Debank, and Zerion aggregate user positions across dozens of protocols, lending positions on Aave, liquidity pools on Uniswap, staked assets on Lido, and more, into a single portfolio view with live USD valuations. Without indexers, each page load would require querying hundreds of smart contracts individually.
Marketplaces with advanced search: Platforms like OpenSea users filter collections by specific traits, sort by rarity rankings, view complete ownership histories, and track floor price movements over time. Indexers make these complex queries across millions of ERC-721s possible without scanning the entire blockchain for every search.
Onchain analytics platforms: Tools like Dune, Nansen, and Flipside Crypto provide custom dashboards showing protocol metrics, tracking DEX trading volumes, lending protocol utilization rates, bridge flows between chains, and whale wallet movements. Analysts write SQL queries against indexed data rather than processing raw blockchain logs.
Trading bots and MEV strategies: Automated trading systems monitor mempool transactions for arbitrage opportunities, track liquidity pool reserves across multiple DEXs for optimal routing, and execute strategies within blocks of triggering events. These require sub-second data access that only indexers can provide at scale.
Wallets: Modern wallets like MetaMask, Rainbow, and Phantom display complete transaction histories, token balances (including tokens you didn't know you had), pending transactions, and estimated gas fees. Each of these features relies on indexed data, directly querying blockchain nodes would make wallet interfaces unusably slow.
Blockchain explorers: Etherscan, Solscan, and similar explorers let users search any address, transaction hash, block number, or token contract and immediately see complete details, related transactions, and historical activity. They're essentially UI layers on top of comprehensive blockchain indexers.
DAO governance platforms: Tools like Snapshot and Tally track proposal lifecycles, voting power calculations based on token holdings at specific blocks, delegation relationships, and voting histories. These platforms need indexed historical state to calculate who was eligible to vote on past proposals.
Risk management and monitoring: Protocols use indexers to monitor large positions that are at risk of being liquidated, track unusual wallet activity patterns for security alerts, identify potential smart contract exploits by analyzing transaction patterns, and generate alerts when specific onchain conditions are met.
Cross-chain bridges: Applications that facilitate asset transfers between chains or find optimal swap routes across multiple networks need real-time indexed data from each blockchain to calculate fees, compare rates, and track transfer status.
Popular Indexers in 2025
The indexing landscape offers solutions ranging from decentralized protocols to fully managed services. Here's a breakdown of the leading options:
The Graph
The Graph is the most widely adopted decentralized indexing protocol. Developers define "subgraphs," which are custom indexing configurations that specify which smart contracts to monitor and how to transform their data into queryable formats. Independent node operators run the indexing infrastructure and earn GRT tokens for serving queries. The Graph is best suited for projects that prioritize censorship resistance and want to rely on decentralized infrastructure rather than centralized service providers.
Goldsky
Goldsky is an infrastructure platform supporting over 90 blockchains with an emphasis on custom data pipelines. It excels at complex data transformations, streaming blockchain data to external databases, and feeding data warehouses for analytics workloads. Goldsky offers both Graph-compatible subgraph hosting and a proprietary Mirror pipeline system for real-time data streaming. It works well for teams that need multi-chain indexing with custom business logic beyond what standard GraphQL queries provide.
Chainstack
Chainstack is an enterprise-grade blockchain infrastructure provider that offers Subgraphs as a managed service. It provides reliable indexing with guaranteed uptime SLAs, global CDN distribution for low-latency queries, and dedicated support channels. The platform supports Ethereum and EVM-compatible chains, and integrates with Chainstack's broader node infrastructure offerings. Chainstack is particularly strong for organizations that require enterprise support, compliance features, and predictable scaling.
How to Choose an Indexer for Your Project
Selecting the right indexer depends on your specific requirements. Here are the key factors to consider:
1. Chain Compatibility Different indexers support different blockchain ecosystems, so verify that your chosen indexer supports the chains you want to build on. Some specialize in specific networks like Solana, while others focus on EVM-compatible chains or offer broad multi-chain coverage.
2. Query Requirements Indexers offer different query interfaces depending on your needs: GraphQL for flexible nested queries, REST for simple predefined endpoints, SQL for analytics workloads, and WebSockets for real-time streaming. Consider how your application will access data and choose an indexer that supports those patterns.
3. Performance Needs Consider your latency and throughput requirements carefully. Some indexers prioritize speed for real-time applications, while others focus on comprehensive historical data access or high-volume analytics queries.
4. Infrastructure Philosophy Decide whether you want a fully managed service, which reduces operational overhead but introduces vendor dependency, or a decentralized protocol, which offers censorship resistance but requires more setup and maintenance.
5. Cost Structure Most indexers offer tiered pricing: free tiers for development, pay-as-you-go for growing projects, and enterprise plans for production workloads. Factor in your expected query volume and any data egress fees when estimating long-term costs.
6. Developer Experience Evaluate documentation quality, SDK support for your preferred programming languages, and the availability of community resources. Strong developer support and clear examples can significantly reduce your integration time.
7. Data Specialization For specific use cases like NFT marketplaces or Solana applications, specialized indexers often provide richer out-of-the-box data than general-purpose solutions. Consider whether pre-enriched data for your domain would save you significant development effort.
Conclusion
Handling onchain data at scale is a hard problem. Blockchains aren't built for the kinds of queries modern applications need—searching, filtering, aggregating across millions of transactions would grind apps to a halt without the right infrastructure.
Indexers solve this problem by doing the heavy lifting: continuously processing blockchain data, organizing it into queryable formats, and serving it through fast APIs. This lets you focus on building great applications instead of wrestling with data pipelines and blockchain nodes.
Ready to get started? Alchemy offers a comprehensive suite of tools and enriched APIs designed to make blockchain development straightforward. Check out the Alchemy documentation to start building.

Related overviews
Understand the differences between permissioned and permissionless chains and how to choose the right one.
Looking for service providers that make building onchain easy? Meet the top players here.
Discover the blockchain API landscape in 2025 and find the best provider for your app.

Build blockchain magic
Alchemy combines the most powerful web3 developer products and tools with resources, community and legendary support.