Clusters All The Way Down

Spencer Smith discusses the Cluster API project and how it can be used with Talos.

Spencer Smith

Howdy folks! I’m Spencer Smith, a software engineer here at Talos Systems. Today I wanted to spend some time writing about an upstream Kubernetes project called Cluster API (CAPI). I think it’s one of the most exciting things being worked on in the community and is going to be a big deal for Talos clusters specifically as time goes on.

What is Cluster API?

The first question is the obvious one. What in the heck is this Cluster API thing? The Cluster API docs describe it as “… a Kubernetes project to bring declarative, Kubernetes-style APIs to cluster creation, configuration, and management”. What this means, essentially, is that the goal of the Cluster API project is to leverage the existing power of the Kubernetes API to orchestrate and maintain the state of other Kubernetes clusters. Hence the “turtles all the way down” logo that the project uses. Worded a bit differently, this project allows you to treat your entire Kubernetes cluster as cattle and easily orchestrate things like upgrades, scale outs, and cross cloud deployments from a single management cluster. As someone who has lived the ops life at both extremes of Kubernetes, lots of small clusters and few large clusters, having a single place to go to manage them is a dream! The ability to do so is accomplished by several custom resources, or CRDs, being added to the Kubernetes API and a couple of controllers to manage those custom resources.

Resources Provided by Cluster API

Let’s dive into the resources that are created, as it’s crucial to know these in order to understand the operation of Cluster API. It really helps to be familiar with the resources that Kubernetes ships with, as they can help mentally map to the resources that Cluster API introduces. Consider the following basic resources:

  • Pods: Smallest unit of deployment in Kubernetes. Generally ephemeral and often orchestrated by higher level resources.
  • ReplicaSets: Maintains a group of pods, keeping a defined number of replicas of the pod in a ready state at any given time.
  • Deployments: Manages the ReplicaSet resources and allows for rollout/rollback functionality.

Now, let’s compare those to their Cluster API cousins:

  • Machine: Smallest cluster unit. May be ephemeral, especially if the node is a Kubernetes worker, and can be easily created in cloud environments or with PXE booting in bare metal scenarios.
  • MachineSet: A group of machines, keeping a defined number of them in ready state at a given time.
  • MachineDeployment: Manages the MachineSet and allows for rollout/rollback functionality

As you can see, they map pretty cleanly in terms of how you may think about each resource. In our Talos CAPI provider, we create three master Machines and a MachineDeployment for workers. This allows for super easy scale-up/scale-down of workers. We don’t use a MachineDeployment for masters yet because they require some special machine configs at boot time and aren’t all the same. However, I’ve heard rumors of a MachineStatefulSet eventually becoming a part of CAPI and I’m hoping we can eventually make use of that.

Cluster API Providers

Since I just mentioned it at the end of the previous section, it’s important to understand a bit about the Cluster API providers. As it currently stands, there are a plethora of different providers, generally one for each cloud environment. These providers function as the controllers that manage the lifecycle of the resources mentioned above and, while they are similar in their overall functionality, the underlying implementation of each provider are wildly different and the options supported may differ between each. There is, however, a concentrated effort among the CAPI community to move towards a more cohesive vision of the “bootstrapping” portion of Cluster API (installing Kubernetes itself) and separate out the “provisioning” portion of just booting infrastructure in a given environment to be much more general. It should also be noted that you can have multiple providers deployed in your management cluster at once, thus providing different clouds to deploy into for your clusters.

We’re lucky with Talos in that all of our configuration happens with the machine config that gets passed to a node at boot time. This allowed us to quickly write infrastructure provisioners for various cloud providers and use the same logic to generate the configs for each, which means that with only our Talos provider deployed in the management cluster, we can deploy nodes to: AWS, Azure, Google Cloud, and Packet.

Using Cluster API with Talos

Since I talked up our snazzy Cluster API provider, let’s try it out! As a pre-requisite, I created a local, docker-based Talos cluster to use as my “management cluster”. This can be done by following our Getting Started docs. I have also setup my Google Cloud with the desired Talos image that I’ll deploy. Instructions for doing so can be found here.

  • With a local cluster running, deploy the provider with kubectl apply -f https://github.com/talos-systems/cluster-api-provider-talos/releases/download/v0.1.0-alpha.3/provider-components.yaml.

  • Following that, I need to create a couple of secrets with the credentials to each cloud I’m interested in using. Run kubectl apply with the following, substituting your own base64 encoded credential files:

    apiVersion: v1
    kind: Secret
    metadata:
    name: aws-credentials
    namespace: cluster-api-provider-talos-system
    data:
    credentials: "{{BASE64_ENCODED_AWS_CREDS}}"
    ---
    apiVersion: v1
    kind: Secret
    metadata:
    name: gce-credentials
    namespace: cluster-api-provider-talos-system
    data:
    service-account.json: "{{BASE64_ENCODED_GCE_CREDS}}"
    
  • Now we can create a small, single master cluster in GCE with the following configuration, substituting your own external IP and any other changes you wish to make to the instance configuration:

    apiVersion: cluster.k8s.io/v1alpha1
    kind: Cluster
    metadata:
    annotations: null
    name: talos-gce
    spec:
    clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    serviceDomain: cluster.local
    services:
      cidrBlocks:
      - 10.96.0.0/12
    providerSpec:
    value:
      apiVersion: talosproviderconfig/v1alpha1
      kind: TalosClusterProviderSpec
      masters:
        ips:
        - x.x.x.x
    ---
    apiVersion: cluster.k8s.io/v1alpha1
    kind: Machine
    metadata:
    labels:
    cluster.k8s.io/cluster-name: talos-gce
    set: master
    name: talos-gce-master-0
    spec:
    providerSpec:
    value:
      apiVersion: talosproviderconfig/v1alpha1
      kind: TalosMachineProviderSpec
      platform:
        config: |-
          zone: "us-central1-c"
          project: "talos-testbed"
          instances:
            type:  "n1-standard-2"
            image: "https://www.googleapis.com/compute/v1/projects/talos-testbed/global/images/talos-e2e"
            disks:
              size: 50
        type: gce
    ---
    apiVersion: cluster.k8s.io/v1alpha1
    kind: MachineDeployment
    metadata:
    labels:
    cluster.k8s.io/cluster-name: talos-gce
    set: worker
    name: talos-gce-workers
    spec:
    replicas: 3
    selector:
    matchLabels:
      cluster.k8s.io/cluster-name: talos-gce
      set: worker
    template:
    metadata:
      labels:
        cluster.k8s.io/cluster-name: talos-gce
        set: worker
    spec:
      providerSpec:
        value:
          apiVersion: talosproviderconfig/v1alpha1
          kind: TalosMachineProviderSpec
          platform:
            config: |-
              zone: "us-central1-c"
              project: "talos-testbed"
              instances:
                type:  "n1-standard-2"
                image: "https://www.googleapis.com/compute/v1/projects/talos-testbed/global/images/talos-e2e"
                disks:
                  size: 50
            type: gce
    
  • After a few minutes, we can see our cluster created in Google Cloud and can retrieve the talosconfig (and by extension, kubeconfig) files with kubectl get cm -n cluster-api-provider-talos-system talos-gce-master-0 -o jsonpath='{.data.talosconfig}'. Our nodes are up and running with kubectl get nodes.

    $ kubectl get nodes
    NAME                                 STATUS   ROLES    AGE   VERSION
    talos-gce-master-0                   Ready    master   21m   v1.15.0
    talos-gce-workers-6fbc48d957-cnkk7   Ready    <none>   21m   v1.15.0
    talos-gce-workers-6fbc48d957-xm4kf   Ready    <none>   21m   v1.15.0
    talos-gce-workers-6fbc48d957-zxk7f   Ready    <none>   21m   v1.15.0
    

Note: Your nodes will not show as “Ready” until a PSP and CNI are applied.

Bring the Multi-Cloud

One of the things that gets me most hyped about CAPI and our provider specifically is the ease of creating nodes across different cloud providers. Let’s add some AWS workers to our cluster as well. We can do that by simply applying another MachineDeployment that targets AWS:

  • Apply the following:

    apiVersion: cluster.k8s.io/v1alpha1
    kind: MachineDeployment
    metadata:
    labels:
    cluster.k8s.io/cluster-name: talos-gce
    set: worker
    name: talos-aws-workers
    spec:
    replicas: 3
    selector:
    matchLabels:
      cluster.k8s.io/cluster-name: talos-gce
      set: worker
    template:
    metadata:
      labels:
        cluster.k8s.io/cluster-name: talos-gce
        set: worker
    spec:
      providerSpec:
        value:
          apiVersion: talosproviderconfig/v1alpha1
          kind: TalosMachineProviderSpec
          platform:
            config: |-
              region: "us-west-2"
              instances:
                type:  "t2.micro"
                ami: "ami-09d6bf816f2ebdfe1"
                keypair: "{{AWS_KEY_NAME}}"
                disks:
                  size: 10
            type: aws
    

Note: You will need to update the AWS keypair for your environment.

And in just a few minutes, we can see the new AWS instances (named ip-*) checking into our cluster!

$ kubectl get nodes
NAME                                 STATUS   ROLES    AGE   VERSION
ip-172-31-36-224                     Ready    <none>   16s   v1.15.0
ip-172-31-43-124                     Ready    <none>   14s   v1.15.0
ip-172-31-45-36                      Ready    <none>   21s   v1.15.0
talos-gce-master-0                   Ready    master   24m   v1.15.0
talos-gce-workers-6fbc48d957-cnkk7   Ready    <none>   24m   v1.15.0
talos-gce-workers-6fbc48d957-xm4kf   Ready    <none>   24m   v1.15.0
talos-gce-workers-6fbc48d957-zxk7f   Ready    <none>   24m   v1.15.0

Closing

As you can see, Cluster API provides some great functionality to streamline the deployment of Kubernetes. It has been extremely satisfying to work on the Talos provider and we’re looking forward to it becoming a defacto standard for deploying our clusters!