NodeOpUpgrade
The NodeOpUpgrade custom resource is a Kairos-specific resource for upgrading Kairos nodes. Under the hood, it creates a NodeOp with the appropriate upgrade script and configuration, so you only need to specify the target image and a few options.
One-off operations and reusing manifestsβ
A NodeOpUpgrade represents a single upgrade run on the target nodes. The operator drives one upgrade flow per object; changing spec on an existing resource is not a supported way to βstart overβ or switch to a different upgrade plan. The API allows spec updates, but behavior after an update is undefined from a product perspectiveβcreate a new NodeOpUpgrade for a new run.
Reusing the same manifest with generateNameβ
To run the same upgrade configuration repeatedly, use metadata.generateName instead of metadata.name and kubectl create (not apply) so each run creates a new NodeOpUpgrade. See NodeOp: One-off operations and reusing manifests for the same pattern and rationale.
Basic Exampleβ
The following is an example of a "canary upgrade", which upgrades Kairos nodes one-by-one (master nodes first). It will stop upgrading if one of the nodes doesn't complete the upgrade and reboot successfully.
The image references below show a valid tag format, but these non-Hadron flavor repositories are not actively updated by the Kairos release pipeline anymore. Build and publish your own upgrade image with BYOI and Kairos Factory.
apiVersion: operator.kairos.io/v1alpha1
kind: NodeOpUpgrade
metadata:
name: kairos-upgrade
namespace: default
spec:
# The container image containing the new Kairos version
image: quay.io/kairos/opensuse:leap-15.6-standard-amd64-generic-v3.4.2-k3sv1.30.11-k3s1
# NodeSelector to target specific nodes (optional)
nodeSelector:
matchLabels:
kairos.io/managed: "true"
# Maximum number of nodes that can run the upgrade simultaneously
# 0 means run on all nodes at once
concurrency: 1
# Whether to stop creating new jobs when a job fails
# Useful for canary deployments
stopOnFailure: true
Only 4 fields is all it takes to safely upgrade the whole cluster.
Spec Referenceβ
| Field | Type | Default | Description |
|---|---|---|---|
image | string | (required) | Container image containing the new Kairos version |
imagePullSecrets | []LocalObjectReference | (none) | Secrets for pulling from private registries (details) |
nodeSelector | LabelSelector | (none) | Standard Kubernetes label selector to target specific nodes |
concurrency | int | 0 | Max nodes running the upgrade simultaneously (0 = all at once) |
stopOnFailure | bool | false | Stop creating new jobs when a job fails (canary mode) |
upgradeActive | bool | true | Whether to upgrade the active partition |
upgradeRecovery | bool | false | Whether to upgrade the recovery partition |
force | bool | false | When true, run the upgrade on every targeted node regardless of whether it is already at spec.image. Disables the preflight skip β see Skipping no-op upgrades. |
Additional Optionsβ
apiVersion: operator.kairos.io/v1alpha1
kind: NodeOpUpgrade
metadata:
name: kairos-upgrade
namespace: default
spec:
image: quay.io/kairos/opensuse:leap-15.6-standard-amd64-generic-v3.4.2-k3sv1.30.11-k3s1
# ImagePullSecrets for private registries (optional)
imagePullSecrets:
- name: private-registry-secret
nodeSelector:
matchLabels:
kairos.io/managed: "true"
concurrency: 1
stopOnFailure: true
# Whether to upgrade the active partition (defaults to true)
upgradeActive: true
# Whether to upgrade the recovery partition (defaults to false)
upgradeRecovery: false
# Whether to force the upgrade. When true, the controller skips the
# preflight version check and runs the upgrade on every targeted node
# even if it is already at spec.image. See "Skipping no-op upgrades" below.
force: false
To upgrade the "recovery" partition instead of the active one, set upgradeRecovery: true and upgradeActive: false:
spec:
# ... other fields ...
upgradeActive: false
upgradeRecovery: true
Skipping no-op upgradesβ
By default, NodeOpUpgrade avoids cordoning, draining, and rebooting nodes that are already running the requested spec.image. This is useful for staggered rollouts β for example, upgrading just the control plane first and then a cluster-wide NodeOpUpgrade with the same image: control-plane nodes are detected as already up-to-date and left alone, while worker nodes go through the normal flow.
This works by leveraging the generic NodeOp preflight mechanism. When the NodeOpUpgrade controller creates the underlying NodeOp, it populates spec.preflight with a short script that:
- Runs in the upgrade image (the one you set in
spec.image), so the script can read the target/etc/kairos-releasedirectly from inside that image without pulling anything else. - Mounts the host's
/etcread-only at/host/etc, so the same script can also read the currently installed/etc/kairos-releaseon the node. - Computes the version triple β
${KAIROS_VERSION}-${KAIROS_SOFTWARE_VERSION_PREFIX}${KAIROS_SOFTWARE_VERSION}β from each side and compares them. - If both versions are known and equal, writes the skip reason to
/dev/termination-log(e.g.node is already at v4.0.3-k3sv1.32.4-k3s1). - If versions differ, can't be determined, or anything else, exits 0 silently β the controller proceeds with the normal cordon β drain β upgrade Job β reboot flow on that node.
The controller honors the preflight verdict by not creating a Job, not cordoning, and not rebooting any node the preflight skipped. The node's entry in status.nodeStatuses is marked Completed with the skip reason from /dev/termination-log, and the per-node concurrency slot is freed immediately for the next node.
When the skip kicks in (and when it doesn't)β
The preflight comparison runs the script against the actual /etc/kairos-release contents on both sides, so it's reliable across:
- Image mirrors / re-tags. It doesn't matter that you mirror
quay.io/kairos/fedora:0.7.1to your own registry; the script reads the file contents, not the image reference. - Nodes that were bootstrapped from a particular image (you didn't have to install via the operator first).
It won't fire when:
- The image has been rebuilt with the same
KAIROS_VERSIONvalues but different content (e.g. a CI re-run with the same tag but a different commit). Version-triple equality is the only signal the script uses; if the metadata is the same, the script will mark the node as already up-to-date. Usespec.force: trueto override. - The host's
/etc/kairos-releaseis missing or doesn't carryKAIROS_VERSION. The script treats either side as "unknown" and falls through to "proceed" rather than wrongly skip β the in-Pod upgrade flow then runs and the user will see whatever it reports.
Forcing the upgradeβ
Set spec.force: true to disable the preflight entirely. The controller creates the NodeOp without spec.preflight, so every targeted node goes straight through cordon β drain β upgrade Job β reboot, regardless of what version is already installed. Use this when you want to re-run an upgrade with the same image but different flags, or to recover from a previous run that ended in a weird state.
How Upgrade Is Performedβ
Before you attempt an upgrade, it's good to know what to expect. Here is how the process works:
- The operator is notified about the NodeOpUpgrade resource and creates a NodeOp with the appropriate script, options, and (unless
spec.forceistrue) aspec.preflightthat compares the image's/etc/kairos-releaseagainst the host's. - The NodeOp controller lists matching Nodes using the provided label selector. If no selector is provided, all Nodes will match.
- The list is sorted with master nodes first, and based on the
concurrencyvalue, the first batch of Nodes will be processed (could be just 1 Node). - For each targeted node, the controller first runs a preflight Pod on that node β a short-lived, non-disruptive Pod (no cordon, no drain) using the upgrade image. The preflight script writes a skip reason to
/dev/termination-logwhen the node is already at the target version, or stays silent otherwise. See Skipping no-op upgrades. - If preflight says skip, the node is recorded as
Completedwith the skip reason and the controller moves on to the next node. No cordon, no drain, no reboot for that node. - If preflight says proceed, the controller creates a reboot Pod and then the upgrade Job (the Job's InitContainer performs the upgrade and the main container creates a sentinel file once it succeeds, which the reboot Pod is watching for).
- When the InitContainer exits successfully, the sentinel file appears; the reboot Pod patches itself with a completion annotation and reboots the node via
nsenter. This way the Job completes successfully before the Node is rebooted, preventing the Job from re-creating its Pod after reboot. - After reboot, the "reboot Pod" is restarted but detects via its own annotation that reboot already happened and exits with
0. - If everything worked successfully, the operator advances to the next batch of nodes, respecting
concurrencyandstopOnFailure.
The result of the above process is that each upgrade Job finishes successfully, with no unnecessary restarts. The upgrade logs can be found in the Job's Pod logs.
The NodeOpUpgrade stores the statuses of the various Jobs it creates so it can be used to monitor the summary of the operation.
Monitoringβ
You can monitor the progress of an upgrade:
$ kubectl get jobs -A
NAMESPACE NAME STATUS COMPLETIONS DURATION AGE
default kairos-upgrade-localhost-wr26f Running 0/1 24s 24s
$ kubectl get nodeopupgrades
NAME AGE
kairos-upgrade 5s
What's next?β
- Upgrading from Kubernetes β full upgrade workflow guide
- Trusted Boot upgrades β upgrades with Trusted Boot enabled
- NodeOp β for custom upgrade logic or other operations
- Bandwidth Optimized Upgrades β optimize bandwidth during upgrades