Skip to main content
Version: Next 🚧

NodeOpUpgrade

The NodeOpUpgrade custom resource is a Kairos-specific resource for upgrading Kairos nodes. Under the hood, it creates a NodeOp with the appropriate upgrade script and configuration, so you only need to specify the target image and a few options.

One-off operations and reusing manifests​

Warning

A NodeOpUpgrade represents a single upgrade run on the target nodes. The operator drives one upgrade flow per object; changing spec on an existing resource is not a supported way to β€œstart over” or switch to a different upgrade plan. The API allows spec updates, but behavior after an update is undefined from a product perspectiveβ€”create a new NodeOpUpgrade for a new run.

Reusing the same manifest with generateName​

To run the same upgrade configuration repeatedly, use metadata.generateName instead of metadata.name and kubectl create (not apply) so each run creates a new NodeOpUpgrade. See NodeOp: One-off operations and reusing manifests for the same pattern and rationale.

Basic Example​

The following is an example of a "canary upgrade", which upgrades Kairos nodes one-by-one (master nodes first). It will stop upgrading if one of the nodes doesn't complete the upgrade and reboot successfully.

Legacy flavor example

The image references below show a valid tag format, but these non-Hadron flavor repositories are not actively updated by the Kairos release pipeline anymore. Build and publish your own upgrade image with BYOI and Kairos Factory.

apiVersion: operator.kairos.io/v1alpha1
kind: NodeOpUpgrade
metadata:
name: kairos-upgrade
namespace: default
spec:
# The container image containing the new Kairos version
image: quay.io/kairos/opensuse:leap-15.6-standard-amd64-generic-v3.4.2-k3sv1.30.11-k3s1

# NodeSelector to target specific nodes (optional)
nodeSelector:
matchLabels:
kairos.io/managed: "true"

# Maximum number of nodes that can run the upgrade simultaneously
# 0 means run on all nodes at once
concurrency: 1

# Whether to stop creating new jobs when a job fails
# Useful for canary deployments
stopOnFailure: true

Only 4 fields is all it takes to safely upgrade the whole cluster.

Spec Reference​

FieldTypeDefaultDescription
imagestring(required)Container image containing the new Kairos version
imagePullSecrets[]LocalObjectReference(none)Secrets for pulling from private registries (details)
nodeSelectorLabelSelector(none)Standard Kubernetes label selector to target specific nodes
concurrencyint0Max nodes running the upgrade simultaneously (0 = all at once)
stopOnFailureboolfalseStop creating new jobs when a job fails (canary mode)
upgradeActivebooltrueWhether to upgrade the active partition
upgradeRecoveryboolfalseWhether to upgrade the recovery partition
forceboolfalseWhen true, run the upgrade on every targeted node regardless of whether it is already at spec.image. Disables the preflight skip β€” see Skipping no-op upgrades.

Additional Options​

apiVersion: operator.kairos.io/v1alpha1
kind: NodeOpUpgrade
metadata:
name: kairos-upgrade
namespace: default
spec:
image: quay.io/kairos/opensuse:leap-15.6-standard-amd64-generic-v3.4.2-k3sv1.30.11-k3s1

# ImagePullSecrets for private registries (optional)
imagePullSecrets:
- name: private-registry-secret

nodeSelector:
matchLabels:
kairos.io/managed: "true"

concurrency: 1
stopOnFailure: true

# Whether to upgrade the active partition (defaults to true)
upgradeActive: true

# Whether to upgrade the recovery partition (defaults to false)
upgradeRecovery: false

# Whether to force the upgrade. When true, the controller skips the
# preflight version check and runs the upgrade on every targeted node
# even if it is already at spec.image. See "Skipping no-op upgrades" below.
force: false

To upgrade the "recovery" partition instead of the active one, set upgradeRecovery: true and upgradeActive: false:

spec:
# ... other fields ...
upgradeActive: false
upgradeRecovery: true

Skipping no-op upgrades​

By default, NodeOpUpgrade avoids cordoning, draining, and rebooting nodes that are already running the requested spec.image. This is useful for staggered rollouts β€” for example, upgrading just the control plane first and then a cluster-wide NodeOpUpgrade with the same image: control-plane nodes are detected as already up-to-date and left alone, while worker nodes go through the normal flow.

This works by leveraging the generic NodeOp preflight mechanism. When the NodeOpUpgrade controller creates the underlying NodeOp, it populates spec.preflight with a short script that:

  1. Runs in the upgrade image (the one you set in spec.image), so the script can read the target /etc/kairos-release directly from inside that image without pulling anything else.
  2. Mounts the host's /etc read-only at /host/etc, so the same script can also read the currently installed /etc/kairos-release on the node.
  3. Computes the version triple β€” ${KAIROS_VERSION}-${KAIROS_SOFTWARE_VERSION_PREFIX}${KAIROS_SOFTWARE_VERSION} β€” from each side and compares them.
  4. If both versions are known and equal, writes the skip reason to /dev/termination-log (e.g. node is already at v4.0.3-k3sv1.32.4-k3s1).
  5. If versions differ, can't be determined, or anything else, exits 0 silently β†’ the controller proceeds with the normal cordon β†’ drain β†’ upgrade Job β†’ reboot flow on that node.

The controller honors the preflight verdict by not creating a Job, not cordoning, and not rebooting any node the preflight skipped. The node's entry in status.nodeStatuses is marked Completed with the skip reason from /dev/termination-log, and the per-node concurrency slot is freed immediately for the next node.

When the skip kicks in (and when it doesn't)​

The preflight comparison runs the script against the actual /etc/kairos-release contents on both sides, so it's reliable across:

  • Image mirrors / re-tags. It doesn't matter that you mirror quay.io/kairos/fedora:0.7.1 to your own registry; the script reads the file contents, not the image reference.
  • Nodes that were bootstrapped from a particular image (you didn't have to install via the operator first).

It won't fire when:

  • The image has been rebuilt with the same KAIROS_VERSION values but different content (e.g. a CI re-run with the same tag but a different commit). Version-triple equality is the only signal the script uses; if the metadata is the same, the script will mark the node as already up-to-date. Use spec.force: true to override.
  • The host's /etc/kairos-release is missing or doesn't carry KAIROS_VERSION. The script treats either side as "unknown" and falls through to "proceed" rather than wrongly skip β€” the in-Pod upgrade flow then runs and the user will see whatever it reports.

Forcing the upgrade​

Set spec.force: true to disable the preflight entirely. The controller creates the NodeOp without spec.preflight, so every targeted node goes straight through cordon β†’ drain β†’ upgrade Job β†’ reboot, regardless of what version is already installed. Use this when you want to re-run an upgrade with the same image but different flags, or to recover from a previous run that ended in a weird state.

How Upgrade Is Performed​

Before you attempt an upgrade, it's good to know what to expect. Here is how the process works:

  1. The operator is notified about the NodeOpUpgrade resource and creates a NodeOp with the appropriate script, options, and (unless spec.force is true) a spec.preflight that compares the image's /etc/kairos-release against the host's.
  2. The NodeOp controller lists matching Nodes using the provided label selector. If no selector is provided, all Nodes will match.
  3. The list is sorted with master nodes first, and based on the concurrency value, the first batch of Nodes will be processed (could be just 1 Node).
  4. For each targeted node, the controller first runs a preflight Pod on that node β€” a short-lived, non-disruptive Pod (no cordon, no drain) using the upgrade image. The preflight script writes a skip reason to /dev/termination-log when the node is already at the target version, or stays silent otherwise. See Skipping no-op upgrades.
  5. If preflight says skip, the node is recorded as Completed with the skip reason and the controller moves on to the next node. No cordon, no drain, no reboot for that node.
  6. If preflight says proceed, the controller creates a reboot Pod and then the upgrade Job (the Job's InitContainer performs the upgrade and the main container creates a sentinel file once it succeeds, which the reboot Pod is watching for).
  7. When the InitContainer exits successfully, the sentinel file appears; the reboot Pod patches itself with a completion annotation and reboots the node via nsenter. This way the Job completes successfully before the Node is rebooted, preventing the Job from re-creating its Pod after reboot.
  8. After reboot, the "reboot Pod" is restarted but detects via its own annotation that reboot already happened and exits with 0.
  9. If everything worked successfully, the operator advances to the next batch of nodes, respecting concurrency and stopOnFailure.

The result of the above process is that each upgrade Job finishes successfully, with no unnecessary restarts. The upgrade logs can be found in the Job's Pod logs.

The NodeOpUpgrade stores the statuses of the various Jobs it creates so it can be used to monitor the summary of the operation.

Monitoring​

You can monitor the progress of an upgrade:

$ kubectl get jobs -A
NAMESPACE NAME STATUS COMPLETIONS DURATION AGE
default kairos-upgrade-localhost-wr26f Running 0/1 24s 24s

$ kubectl get nodeopupgrades
NAME AGE
kairos-upgrade 5s

What's next?​