kubernetes

Building Arges: A New Tool for Bare Metal Kubernetes. Part Two.

Note: Arges is under active development. For the latest and greatest deployment instructions, see the README here.

If you missed Part One, read it here!

In part two of this blog series, it's finally time to get down to business. Now that we've talked about the architecture of Arges and the bare metal hardware we're using, we can deploy the management plane and start stamping out clusters!

If you want to skip right to the code, check it out on GitHub: talos-systems/arges

Chicken, Meet Egg

So, as you may already know, Cluster API is a project that uses Kubernetes and custom resources to provision and manage the lifecycle of other Kubernetes clusters. The downside to that is that, well, it requires an initial Kubernetes cluster.

This presents a chicken and egg scenario that can be a little daunting at first because this means that we need that first cluster just to provision our management plane. Luckily, talosctl gives us a nice way to spin up a temporary docker-based cluster from our laptop to do the initial bootstrapping. We'll walk through the following high level steps below:

  • Create a local, Talos-based Kubernetes cluster that we'll refer to as "bootstrap cluster"
  • Install Arges components to bootstrap cluster
  • Configure DHCP server with PXE info from Metal Controller Manager service
  • Craft YAML for management plane creation
  • Boot management plane nodes and provision Talos/Kubernetes

Note: The current downside of this approach is that we're essentially orphaning the management plane after we bootstrap it with our local Docker cluster. There is some talk around upstream Cluster API around "pivoting" a cluster after initial bootstrapping to be self-managed. This is something that we plan to explore for the management plane soon.

Bootstrap Environment

Let's get our local cluster created. The talosctl CLI tool makes this easy. We will create this cluster with some ports exposed that we can later configure on our router to allow PXE booting.

Note: You can download talosctl from the assets published in the latest Talos release

Run the following command:

  • Create the local cluster with talosctl cluster create -p 69:69/udp,8081:8081/tcp,9091:9091/tcp --endpoint <DOCKER_HOST_IP>
    Note: The extra -p flag allows us to expose ports for our components on the docker host. As of this writing, this functionality is only present in the master branch of Talos.
  • Grab the kubeconfig from your newly created cluster with talosctl kubeconfig /desired/path/to/kubeconfig
  • Set kubeconfig environment variable with export KUBECONFIG=/desired/path/to/kubeconfig
  • Checkout the Arges repo with git clone git@github.com:talos-systems/arges.git

Note: As it currently stands, we need to host the kubeconfig via HTTP/HTTPS so that any bare metal nodes can pull this info and register themselves with our Metal Controller Manager by creating a "server" resource in the parent cluster. The setup of hosting this file is left as an exercise for the reader. This is a temporary requirement; the next version of our Metal Controller Manager will have an API for server registration and this will no longer be necessary.

After setting up the kubeconfig hosting, we can continue with some kustomize editing. From the root of the arges repo:

  • Create a patch for the Metal Controller Manager containing the location of the kubeconfig file
    cat <<EOF >examples/mgmt/discovery_kubeconfig_patch.yaml - op: add path: /spec/template/spec/containers/1/args/- value: --discovery-kubeconfig=${DISCOVERY_KUBECONFIG_ENDPOINT} EOF
  • We must also create a patch for the default environment used by the Metal Controller Manager. Substitute the IP of your docker host in the patch below:
    cat <<EOF >examples/mgmt/default_environment_patch.yaml - op: add path: /spec/kernel/args/- value: talos.config=http://<HOST_IP>:9091/configdata?uuid= EOF
  • Install the Arges components with kustomize build ./examples/bootstrap | kubectl apply -f -
  • Verify we can reach our services. Test the metadata server by running curl localhost:9091. This should return a 404 message.

Network Configuration

Now that our services are up and running, we need to configure our DHCP server to properly forward PXE requests. This step will likely depend on your particular network setup. On my home EdgeRouter, I was able to configure the dhcpd service that ships with the device. The steps I followed were:

  • SSH into router
  • Issue the configure command
  • Tell DHCP service to import a custom config file with set service dhcp-server shared-network-name LAN subnet 192.168.1.0/24 subnet-parameters 'include "/etc/dhcp/ipxe.conf";'
  • Issue commit and save, then exit
  • Open the file /etc/dhcp/ipxe.conf and populate it with the following (update fields where applicable):allow bootp; allow booting; group "talos-mgmt" { next-server <DOCKER_HOST>; if exists user-class and option user-class = "iPXE" { filename "http://<DOCKER_HOST>:8081/boot.ipxe"; } else { filename "ipxe.efi"; } host talos-mgmt-0 { fixed-address <MGMT_NODE_IP>; hardware ethernet <MGMT_NODE_MAC>; } }
  • After saving, restart the EdgeRouter's DHCP service with sudo systemctl restart vyatta-dhcpd

The Management Plane

Registering Machines

With the services exposed, now we need to ensure that the Arges TFTP and iPXE servers will be used when PXE booting machines. The way in which this is done varies greatly, and is left as an excercise for the reader.

Now that our bootstrap services are exposed, we will use them to boot a lightweight environment that will register the machine with Arges and make it available to provision as a Kubernetes node. To register machines, simply power them on. The machines should do a number of things:

  • boot using the deployed TFTP and iPXE services in Arges
  • download the discovery kubeconfig
  • create a custom server resource in the Kubernetes cluster hosting Arges

A flow of the registration process looks like:

server-registration

Once this process is done, the machine will shut down, and you should something like the following:

$ kubectl get servers
NAME                                   AGE
00000000-0000-0000-0000-d05099d4c8ed   13s

Create the Management Cluster

Now it's time to put together the YAML to define our management plane and get it created.

A high level flow of all the steps mentioned below looks like:

cluster-bootstrap-flow
Generate the Talos Configuration Files

We will start by creating the configs required by Talos:

  • Generate configs with
$ talosctl gen config managment-plane https://<MGMT_MASTER_IP>:6443 -o ./examples/clusters/metal/generate
generating PKI and tokens
created examples/clusters/metal/generate/init.yaml
created examples/clusters/metal/generate/controlplane.yaml
created examples/clusters/metal/generate/join.yaml
created examples/clusters/metal/generate/talosconfig
  • Configure the generated talosconfig
talosctl --talosconfig ./examples/clusters/metal/generate/talosconfig config endpoint <MGMT_MASTER_IP>
  • Make any additional edits to init.yaml or other machine configs now
Create the MetalCluster Resource

We can now craft our yaml for deployment. Using ./examples/clusters/metal as a working directory:

  • Use the following as a template and create cluster.yaml
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
  name: management-plane
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - 10.244.0.0/16
    services:
      cidrBlocks:
        - 10.96.0.0/16
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: MetalCluster
    name: management-plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: MetalCluster
metadata:
  name: management-plane
spec:
  apiEndpoints:
    - host: <MGMT_MASTER_IP>
      port: 6443
Create the TalosConfig Resources
  • Use the following as a template and create talosconfigs.yaml, adding the contents of init.yaml as the data field:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: TalosConfig
metadata:
  name: mgmt-0
  labels:
    cluster.x-k8s.io/cluster-name: management-plane
spec:
  generateType: none
  data: |

Create the MetalMachine Resources

Use the following as a template and create machines.yaml, substituting the BMC info with your IPMI credentials, as well as UUID with the server's UUID as shown with kubectl get servers:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: MetalMachine
metadata:
  name: mgmt-0
spec:
  serverRef:
    apiVersion: metal.arges.dev/v1alpha1
    kind: Server
    name: <UUID>
  bmc:
    endpoint: <ENDPOINT>
    user: <USER>
    pass: <PASSWORD>
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Machine
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: management-plane
    cluster.x-k8s.io/control-plane: "true"
  name: mgmt-0
spec:
  bootstrap:
    configRef:
      apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
      kind: TalosConfig
      name: mgmt-0
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: MetalMachine
    name: mgmt-0
  version: 1.18.0

Create the Cluster

Now that we have a PXE boot environment, and the Talos config files, we are ready to create our cluster. To do so, run the following:

kubectl create namespace arges-examples
kustomize build ./examples/clusters/metal | kubectl apply -f -

We can see the cluster in the arges-examples namespaces:

$ kubectl get clusters -n arges-examples
NAME        PHASE
management-plane   provisioned

I generally log in to the BMC's console at this point to check out the output of the Talos install and get a general feel for where we are in the provisioning process. Once it's complete, we can finally transition to using our management plane for future cluster creations! You just need to issue talosctl kubeconfig to fetch the kubeconfig that was generated for our management plane.

Gimme Mo' K8s

We've got our management plane provisioned, which now will serve as a long running set of components for any future clusters. So now we're finally at a spot where we can stamp these bad boys out! That said, the beauty of creating new clusters is that it's the same process as detailed above.

So, at a high level:

  • Deploy the Arges components onto management plane
  • Setup DHCP server for new nodes that will be PXE booting
  • Boot servers to register them in the management plane
  • Craft YAML for each cluster and apply with kustomize/kubectl
  • Profit!

Wrap-up

I hope you have enjoyed this introduction to Arges. Here at Talos we are pretty excited about this project and it has been a ton of fun to work on. Feel free to give it a shot and hit us up on Slack with any questions or concerns!

Leave a Reply

Your email address will not be published. Required fields are marked *