Running Avalanche Validator Using Kubernetes

Introduction

When Avalanche project launched in summer 2020 I liked the team’s democratic approach to validators. With a relatively small stake of 2000 AVAX, simple setup, and very permissive node availability requirement (60% uptime), one can launch a cheap validator node and make a decent income (~7–11% APR during the 1st year) considering the effort required.

The simplest way to set up an Avalanche validator is probably to pick up an inexpensive cloud provider like Digital Ocean, get 2 CPU/4 GB RAM droplet for $20 per month, clone avalanchego repo, build it using the supplied script and run the node from bin directory directly. This setup works quite well and Digital Ocean has been a very reliable operator from my experience.

While launching the node is simple, adding observability to it using the proposed Prometheus/Grafana setup is not. Node management is also a very tedious process — it is quite easy to forget what curl to run to get node info and how to mount staking key after some time away from the project. Thus it would be helpful to create a modular setup, such that various modules (Avalanche node, metrics, wallet, etc) can be snapped in and out and upgraded as needed. Luckily most of the building blocks are already provided by Avalanche team, such as the node itself in docker hub or Grafana metrics dashboards, including alarm definition.

With all these in mind kubernetes (k8s) comes to mind. Normally k8s is associated with high-availability and scalability of applications, however, tooling around it also allows for a modular architecture of applications.

For this project, I will be using helm to manage k8s modules. It is not only one of the most popular k8s deployments tools out there, but Prometheus/Grafana has an officially supported k8s operator with dashboards to monitor k8s cluster, which can be deployed from helm repo. Avalanche validator dashboards can be added to those, thus providing the full set of metrics for the cluster and a validator node.

I also considered building a highly available validator setup using multiple replicas, but Avalanche doesn’t provide hooks for such setup, and building it around the node is not easy. Given the fairly lax uptime requirement and a large number of other validators available to support the network, I don’t believe this is something which has to be addressed right away (or ever).

Architecture

The setup has 2 main pieces as shown on a diagram below:

  1. Prometheus/grafana operator for metrics and observability of the k8s cluster
  2. Avalanche validator node packaged as a helm chart

The source for the project is in this repo.

Prometheus Operator with Grafana

Prometheus operator with Grafana has 3 main components — Prometheus itself to scrape and store metrics, Grafana for visualization, and Alermanager to dispatch alerts. It also has additional auxiliary pods for OS metric collection and other tasks. It can be deployed “as is” from helm repo however since there is no obvious way to request a particular version of helm charts from its repo to “pin” it for this setup I decided to check in the latest version (kube-prometheus-stack-10.1.0) of the operator into my github repo.

Prometheus operator is deployed with PVC (persistent volume claim) enabled (its default is local storage, e.g. no PVC). In short, PVC is storage that survives pod restart. This is needed to support “higher availability” of the setup when one of the k8s cluster nodes has to be upgraded and all pods need to be drained from it. PVC preserves historical metrics, which become available when a new pod is launched on another node.

Grafana, which comes with Prometheus operator has dashboards to monitor k8s cluster, pods, and nodes. Unfortunately adding custom dashboards (e.g. for avalanche node) is very difficult using suggested ConfigMaps. It is much easier to do that using Grafana dashboard API. One drawback of this approach is that Grafana doesn’t offer an option to enable PVC (a very unfortunate design choice) and thus when Grafana pod is restarted, custom dashboards and all changes made to them during operation need to be repeated.

Prometheus operator with Grafana is installed in its namespace — Prometheus. Among other things, it allows simple cleanup for the operator by deleting the whole namespace.

Avalanche Validator

Avalanche validator (avalanchego) is implemented as a StatefulSet with one pod since it needs to act as a singleton — at any time there should be only one avalanchego pod with the same staking key. The StatefulSet is backed by PVC which contains blockchain data and after a pod restart, the node needs to get only a small delta.

The validator exposes 2 services — a peer service with port 9651 to communicate with peers and do actual validation and a management service with port 9650 to obtain information about the avalanche node and do other operations. The peer service is a NodePort type. NodePort is a type of k8s service which is exposed to the outside world. It allows connecting to the pod behind it using k8s node public ip and service port (9651). With this setup, avalanchego will appear to its peers as a process running on a regular VM with public ip and listening on port 9651.

Management service is a ClusterIP type, which means that only callers within k8s cluster are able to connect to it. This is exactly what we need to secure a node from the rest of the internet. In practice, it means that we have to use port forward or dedicated cli pod with kubectl exec to access Avalanche node from a laptop. Thus I added a cli pod which is a standard curl container.

Provisioning of a public ip address of Avalanche node is the most confusing part of the whole validator setup. Looking at the discussion in various forums, the purpose of public ip is still an open question. I dug through the code to understand how it is used. Seemingly it is only used to exclude the node itself from a list of its peers. Supposedly when a validator node receives the full list of other validators, this list will contain its ip address and during the broadcast, it needs to exclude itself from the list (?). I also experimented with not supplying the address to the node — the node works just fine. What’s more, I got a list of peers and found a few validators among them with missing public ip in their peer info, which still happily participate in the validation and receive rewards. This is possible to explain since peers are likely using the sender’s ip addresses to gossip about other validators. And perhaps this public ip value is only needed for avalanche node internal use.

Eventually, I found that setting — dynamic-public-ip=opendns as an avalanche node argument allows resolving k8s node public ip address inside the avalanche node itself, thus no special steps are required to figure out its public ip address before node launch. This is a list of launch arguments of an avalanche node. With these arguments node ip address is properly set in json returned by info.peers method.

Avalanche validator uses a staking key pair to identify itself (sign messages, receive rewards, etc) to other peers. Its node id is derived from the address, thus we have to allow provisioning staking key pair for a node in case a node is re-created in a different cloud provider in the middle of its validation period. It is achieved using k8s secret, which is essentially an obfuscated version of a ConfigMap. Staking key pair can be supplied during avalanche deployment as public and private key files and setup will create a k8s secret, which is mounted to look like a directory in avalanchego container. In case this is a brand new node and there is no staking key pair yet, Avalanche node itself generates the key pair. However a node operator needs to extract the key pair, otherwise, it will be lost when Avalanche is removed.

K8s uses readiness and liveness checks to monitor the node and restart it if needed. Avalanche has a dedicated endpoint /ext/health to check its liveness. However, in practice, I found this endpoint is not useful at least not for k8s. It returns 200 only when the node is fully synchronized, which takes several hours from the first start. During this time avalanche pod is not accessible in any way, so it is hard to understand what’s going on. It would be better if this endpoint would return 200 when a node is online regardless of its synchronization status. Instead, I use /ext/info info.getNodeID method to check its readiness and liveness.

All components of Avalanche validator are deployed into a dedicated avalanche namespace. Again it is very easy to clean up by deleting namespace if things go wrong. The dedicated namespace also allows setting access permissions and resource limits in the future.

Avalanche node setup is implemented as a helm chart.

Cluster Setup

I used GoogleCloud as my cloud provider. It has arguably the best support for k8s. To save on costs I decided to create a k8s cluster with only one node. This seems strange, however high uptime is not strictly required and the time to upgrade and restart a node is relatively small when GoogleCloud does its auto-updates. I experimented with 2 node setup as well. In this case, pods will be evenly distributed across all k8s nodes. A node can be drained and all pods will migrate to the remaining node, when a node is un-cordoned, they will be redistributed again.

I used a single 2 CPU/4GB node to run both Prometheus and avalanche. According to Google, each cloud account gets one k8s controller free of charge, thus such a setup would cost approximately as a single VM, which is about $30 per month.

Below are CPU and memory consumption for the whole node. After the initial synchronization, the total CPU is barely 0.25 of a single core. Memory however is creeping up, primarily due to avalanche node. It went from about 200 MB in the beginning and reached 700 MB after several days. Total memory is still well below 4 GB of the node and is about 1.5 GB. Avalanche limit is set to 1.5GB. When it reaches this number avalanchego pod can be killed, thus it likely needs to be increased to perhaps 2.5–3GB.

March 2021 update. I had to increase node size to 2 CPU/8GB node, since avalanchego node would run out of memory every few days and pods would get restarted. New memory limit for avalanchego pod is 4 GB.

Grafana Dashboards

There are 2 sets of dashboards in this project:

  1. k8s dashboards that come with Prometheus operator.
  2. Official avalanche dashboards installed from this repo.

Access to Grafana is done via port forward on port 3000. Login/password is Grafana default: admin/prom-operator. Avalanche dashboards are installed under Avalanche folder in Grafana. Below is an example of how one of them looks like.

Notifications

Avalanche dashboards come with a predefined alarms. The setup will create a Telegram notification channel and hook up the alarms to it during dashboard installation. This works well however, my Telegram is flooded with “Average Block Acceptance alerts”. This is due to Average Rejection Latency being reported too rarely as shown below. Still, this is a good test of the alerting capability of the Prometheus operator.

Alarm setup is optional and if no notification channel is created, then alarms are not configured. Creating Telegram notification channel requires creating a designated Telegram channel and getting API key from Telegram BotFather. See this section of README for more detail.

Tooling

While the whole setup is packaged as helm charts, there is still a number of manual steps that must be done. These are cluster and namespace creation and dashboard installation, which is done via Grafana API. These steps and helm chart deployment are packaged into a bash script deploy-cli.sh. It can create a cluster, install, update, delete Prometheus and Avalanche modules as well as dashboards and Telegram channel. For details see README.md in the project.

There is another tool ava-cli.sh, which connects to Avalanche node to pull its id, get its synchronization status, and extract a staking key pair.

To connect to Grafana, Prometheus, or Avalanche node I added another tool — port-forward.sh. It forwards a port from k8s onto a host, such that a user can connect to services running in k8s cluster from a laptop. The tool prints a connection string, when launched. This way it is possible to use Postman to execute commands against Avalanche node. It would be great if someone could create a Postman project with all Avalanche curls.

Installation and Upgrade

Installation steps are described in README.md and pretty straightforward. A cluster can be created either manually or using deploy-cli.sh. The setup will work on a single cluster node or a cluster with multiple nodes.

To change configuration parameters such as memory requests and limits, kube-avax-values.yaml values file needs to be edited. There is a similar file for Prometheus operator — k8-prom-values.yaml. To re-apply these values to already running node, use deploy-cli.sh with update parameter.

To upgrade Avalanche container, update a tag value here. Then use deploy-cli.sh update to re-deploy.

Conclusion

This project simplifies Avalanche validator node installation, upgrade and monitoring by relying on available modules and k8s tooling and kubernetes cluster. Code and scripts for the project is available in this repo under MIT license.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store