Observability

The ZKsync node provides several options for setting up observability. Configuring logs and sentry is described in the configuration section, so this section focuses on the exposed metrics.

This section is written with the assumption that you're familiar with Prometheus and Grafana.

Buckets

By default, latency histograms are distributed in the following buckets (in seconds):

[0.001, 0.005, 0.025, 0.1, 0.25, 1.0, 5.0, 30.0, 120.0]

Metrics

The ZKsync node exposes a lot of metrics, a significant amount of which aren't interesting outside the development flow. This section's purpose is to highlight metrics that may be worth observing in the external setup.

If you are not planning to scrape Prometheus metrics, please unset EN_PROMETHEUS_PORT environment variable to prevent memory leaking.

Metric nameTypeLabelsDescription
external_node_syncedGauge-1 if synced, 0 otherwise. Matches eth_call behavior
external_node_sync_lagGauge-How many blocks behind the main node the ZKsync node is
external_node_fetcher_requestsHistogramstage, actorDuration of requests performed by the different fetcher components
external_node_fetcher_cache_requestsHistogram-Duration of requests performed by the fetcher cache layer
external_node_fetcher_miniblockGaugestatusThe number of the last L2 block update fetched from the main node
external_node_fetcher_l1_batchGaugestatusThe number of the last batch update fetched from the main node
external_node_action_queue_action_queue_sizeGauge-Amount of fetched items waiting to be processed
server_miniblock_numberGaugestage=sealedLast locally applied L2 block number
server_block_numberGaugestage=sealedLast locally applied L1 batch number
server_block_numberGaugestage=tree_lightweight_modeLast L1 batch number processed by the tree
server_processed_txsCounterstage=mempool_added, state_keeperCan be used to show incoming and processing TPS values
api_web3_callHistogrammethodDuration of Web3 API calls
sql_connection_acquireHistogram-Time to get an SQL connection from the connection pool

Interpretation

After applying a dump, the ZKsync node has to rebuild the Merkle tree to verify the correctness of the state in PostgreSQL. During this stage, server_block_number { stage='tree_lightweight_mode' } is increasing from 0 to server_block_number { stage='sealed' }, while the latter does not increase (the ZKsync node needs the tree to be up-to-date to progress).

After that, the ZKsync node has to sync with the main node. server_block_number { stage='sealed' } is increasing, and external_node_sync_lag is decreasing.

Once the node is synchronized, it is indicated by the external_node_synced.

Metrics can be used to detect anomalies in configuration, which is described in more detail in the next section.


Made with ❤️ by the ZKsync Community