You are viewing the development docs which are in progress. There is no guarantee that the development documentation will be accurate, including instructions, links, and other information. For the latest stable documentation, click here.
NodeOpUpgrade
The NodeOpUpgrade custom resource is a Kairos-specific resource for upgrading Kairos nodes. Under the hood, it creates a NodeOp with the appropriate upgrade script and configuration, so you only need to specify the target image and a few options.
Basic Example
The following is an example of a “canary upgrade”, which upgrades Kairos nodes one-by-one (master nodes first). It will stop upgrading if one of the nodes doesn’t complete the upgrade and reboot successfully.
apiVersion: operator.kairos.io/v1alpha1
kind: NodeOpUpgrade
metadata:
name: kairos-upgrade
namespace: default
spec:
# The container image containing the new Kairos version
image: quay.io/kairos/opensuse:leap-15.6-standard-amd64-generic-v3.4.2-k3sv1.30.11-k3s1
# NodeSelector to target specific nodes (optional)
nodeSelector:
matchLabels:
kairos.io/managed: "true"
# Maximum number of nodes that can run the upgrade simultaneously
# 0 means run on all nodes at once
concurrency: 1
# Whether to stop creating new jobs when a job fails
# Useful for canary deployments
stopOnFailure: true
Only 4 fields is all it takes to safely upgrade the whole cluster.
Spec Reference
| Field | Type | Default | Description |
|---|---|---|---|
image | string | (required) | Container image containing the new Kairos version |
imagePullSecrets | []LocalObjectReference | (none) | Secrets for pulling from private registries (details) |
nodeSelector | LabelSelector | (none) | Standard Kubernetes label selector to target specific nodes |
concurrency | int | 0 | Max nodes running the upgrade simultaneously (0 = all at once) |
stopOnFailure | bool | false | Stop creating new jobs when a job fails (canary mode) |
upgradeActive | bool | true | Whether to upgrade the active partition |
upgradeRecovery | bool | false | Whether to upgrade the recovery partition |
force | bool | false | Whether to force the upgrade without version checks |
Additional Options
apiVersion: operator.kairos.io/v1alpha1
kind: NodeOpUpgrade
metadata:
name: kairos-upgrade
namespace: default
spec:
image: quay.io/kairos/opensuse:leap-15.6-standard-amd64-generic-v3.4.2-k3sv1.30.11-k3s1
# ImagePullSecrets for private registries (optional)
imagePullSecrets:
- name: private-registry-secret
nodeSelector:
matchLabels:
kairos.io/managed: "true"
concurrency: 1
stopOnFailure: true
# Whether to upgrade the active partition (defaults to true)
upgradeActive: true
# Whether to upgrade the recovery partition (defaults to false)
upgradeRecovery: false
# Whether to force the upgrade without version checks
force: false
To upgrade the “recovery” partition instead of the active one, set upgradeRecovery: true and upgradeActive: false:
spec:
# ... other fields ...
upgradeActive: false
upgradeRecovery: true
How Upgrade Is Performed
Before you attempt an upgrade, it’s good to know what to expect. Here is how the process works:
- The operator is notified about the NodeOpUpgrade resource and creates a NodeOp with the appropriate script and options.
- The operator creates a list of matching Nodes using the provided label selector. If no selector is provided, all Nodes will match.
- The list is sorted with master nodes first, and based on the
concurrencyvalue, the first batch of Nodes will be upgraded (could be just 1 Node). - Before the upgrade Job is created, the operator creates a Pod that will perform the reboot when the Job completes. Then a
Jobis created that performs the upgrade. - The Job has one InitContainer, which performs the upgrade and a container which runs only if the upgrade script completes successfully. When the InitContainer exits, the container creates a sentinel file on the host’s filesystem which is what the “reboot” Pod waits for, in order to perform the reboot. This way the Job completes successfully before the Node is rebooted. This is important because it prevents the Job from re-creating its Pod after reboot (which would be the case if the Job performed the reboot before it exited gracefully).
- After the reboot of the Node, the “reboot Pod” will be restarted but it will detect that reboot has already happened (using an annotation on itself that works as a sentinel) and will exit with
0. - If everything worked successfully, the operator will create another Job to replace the one that finished, resulting in
concurrencynumber of Nodes being upgraded in parallel.
The result of the above process is that each upgrade Job finishes successfully, with no unnecessary restarts. The upgrade logs can be found in the Job’s Pod logs.
The NodeOpUpgrade stores the statuses of the various Jobs it creates so it can be used to monitor the summary of the operation.
Monitoring
You can monitor the progress of an upgrade:
$ kubectl get jobs -A
NAMESPACE NAME STATUS COMPLETIONS DURATION AGE
default kairos-upgrade-localhost-wr26f Running 0/1 24s 24s
$ kubectl get nodeopupgrades
NAME AGE
kairos-upgrade 5s
What’s next?
- Upgrading from Kubernetes — full upgrade workflow guide
- Trusted Boot upgrades — upgrades with Trusted Boot enabled
- NodeOp — for custom upgrade logic or other operations
- Bandwidth Optimized Upgrades — optimize bandwidth during upgrades