Reset a node

Kairos has a recovery mechanism built-in which can be leveraged to restore the system to a known point. At installation time, the recovery partition is created from the installation medium and can be used to restore the system from scratch, leaving configuration intact and cleaning any persistent data accumulated by usage in the host (e.g. Kubernetes images, persistent volumes, etc. ).

The reset action will regenerate the bootloader configuration and the images in the state partition (labeled COS_STATE) by using the recovery image generated at install time, cleaning up the host.

The configuration files in /oem are kept intact, the node on the next reboot after a reset will perform the same boot sequence (again) of a first-boot installation.

How to

The reset action can be accessed via the Boot menu, remotely, triggered via Kubernetes or manually. In each scenario the machine will reboot into reset mode, perform the cleanup, and reboot automatically afterwards.

From the boot menu

It is possible to reset the state of a node by either booting into the “Reset” mode into the boot menu, which automatically will reset the node:

reset

Remotely, via command line

On a Kairos booted system, logged as root:

$ grub2-editenv /oem/grubenv set next_entry=statereset
$ reboot

To directly select the entry:

$ kairos-agent bootentry --select statereset

Or to get a list of available boot entries and select one interactively:

$ kairos-agent bootentry

From Kubernetes

system-upgrade-controller can be used to apply a plan to the nodes to use Kubernetes to schedule the reset on the nodes itself, similarly on how upgrades are applied.

Consider the following example which resets a machine by changing the config file used during installation:

---
apiVersion: v1
kind: Secret
metadata:
  name: custom-script
  namespace: system-upgrade
type: Opaque
stringData:
  config.yaml: |
    #cloud-config
    hostname: testcluster-{{ trunc 4 .MachineID }}
    k3s:
      enabled: true
    users:
    - name: kairos
      passwd: kairos
      ssh_authorized_keys:
      - github:mudler    
  add-config-file.sh: |
    #!/bin/sh
    set -e
    if diff /host/run/system-upgrade/secrets/custom-script/config.yaml /host/oem/90_custom.yaml >/dev/null; then
        echo config present
        exit 0
    fi
    # we can't cp, that's a symlink!
    cat /host/run/system-upgrade/secrets/custom-script/config.yaml > /host/oem/90_custom.yaml
    kairos-agent bootentry --select statereset
    sync

    mount --rbind /host/dev /dev
    mount --rbind /host/run /run
    nsenter -i -m -t 1 -- reboot
    exit 1    
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: reset-and-reconfig
  namespace: system-upgrade
spec:
  concurrency: 2
  # This is the version (tag) of the image.
  version: "leap-15.6-standard-amd64-generic-v3.1.0-k3sv1.30.2-k3s1"
  nodeSelector:
    matchExpressions:
      - { key: kubernetes.io/hostname, operator: Exists }
  serviceAccountName: system-upgrade
  cordon: false
  upgrade:
    # Here goes the image which is tied to the flavor being used.
    # Currently can pick between opensuse and alpine
    image: quay.io/kairos/opensuse
    command:
      - "/bin/bash"
      - "-c"
    args:
      - bash /host/run/system-upgrade/secrets/custom-script/add-config-file.sh
  secrets:
    - name: custom-script
      path: /host/run/system-upgrade/secrets/custom-script
---
apiVersion: v1
kind: Secret
metadata:
  name: custom-script
  namespace: system-upgrade
type: Opaque
stringData:
  config.yaml: |
    #cloud-config
    hostname: testcluster-{{ trunc 4 .MachineID }}
    k3s:
      enabled: true
    users:
    - name: kairos
      passwd: kairos
      ssh_authorized_keys:
      - github:mudler    
  add-config-file.sh: |
    #!/bin/sh
    set -e
    if diff /host/run/system-upgrade/secrets/custom-script/config.yaml /host/oem/90_custom.yaml >/dev/null; then
        echo config present
        exit 0
    fi
    # we can't cp, that's a symlink!
    cat /host/run/system-upgrade/secrets/custom-script/config.yaml > /host/oem/90_custom.yaml
    grub2-editenv /host/oem/grubenv set next_entry=statereset
    sync

    mount --rbind /host/dev /dev
    mount --rbind /host/run /run
    nsenter -i -m -t 1 -- reboot
    exit 1    
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: reset-and-reconfig
  namespace: system-upgrade
spec:
  concurrency: 2
  # This is the version (tag) of the image.
  version: "leap-15.6-standard-amd64-generic-v3.1.0-k3sv1.30.2-k3s1"
  nodeSelector:
    matchExpressions:
      - { key: kubernetes.io/hostname, operator: Exists }
  serviceAccountName: system-upgrade
  cordon: false
  upgrade:
    # Here goes the image which is tied to the flavor being used.
    # Currently can pick between opensuse and alpine
    image: quay.io/kairos/opensuse
    command:
      - "/bin/bash"
      - "-c"
    args:
      - bash /host/run/system-upgrade/secrets/custom-script/add-config-file.sh
  secrets:
    - name: custom-script
      path: /host/run/system-upgrade/secrets/custom-script

Manual reset

It is possible to trigger the reset manually by logging into the recovery from the boot menu and running kairos reset from the console.

Cleaning up state directories

An alternative way and manual of resetting your system is possible by deleting the state paths. You can achieve this by deleting the contents of the /usr/local directory. It’s recommended that you do this while in recovery mode with all services turned off.

Please note that within /usr/local, there are two important folders to keep in mind. The first is /usr/local/.kairos, which contains sentinel files that will trigger a complete deployment from scratch when deleted. However, your data will be preserved. The second folder is /usr/local/.state, which contains the bind-mounted data for the system. By deleting these two folders, you can achieve a pristine environment while leaving all other contents of /usr/local untouched.


Last modified February 23, 2024: Reduce sizes and remove warnings (0e183ae)