• Cloud-Native Container ProductsCloud-Native Container Products
    • KubeSphere Enterprisehot
    • KubeSphere Virtualizationhot
    • KubeSphere Enterprise HCI
  • Cloud-Native ServiceCloud-Native Service
    • KubeSphere Backuphot
    • KubeSphere Litenew
    • KubeSphere Inspectornew
  • Public Cloud Container ServicePublic Cloud Container Service
    • KubeSphere on AWS
    • KubeSphere on DigitalOcean

Troubleshooting

Describes how to troubleshoot issues related to KubeSphere Backup.

After you import a Kubernetes cluster, a namespace qiming-backend will be created on the cluster for installing backup and recovery components, including Velero installer and Velero itself.

This document describes how to troubleshoot issues related to KubeSphere Backup.

Kubernetes cluster resource request

To let backup and recovery components run smoothly, make sure your Kubernetes cluster has enough resource requests. The default resource settings are described as follows:

  1. The Velero installer is a DaemonSet named restic that controls the transmission of local files to an S3 storage repository. Its resource request is as follows:

    name: restic resources: limits: cpu: "1" memory: 1Gi requests: cpu: 500m memory: 512Mi

    common:NOTE

    The default settings are recommended. To change the resource settings, run kubectl edit daemonset restic -n qiming-backend, modify the settings under spec.template.spec.containers.resources, and save your changes.

  2. Velero is a 1-replica Deployment that handles the control logic of Velero, including backup and recovery. Its resource request is as follows:

    name: velero resources: limits: cpu: "1" memory: 1000Mi requests: cpu: 500m memory: 256Mi

    common:NOTE

    The default settings are recommended. To change the resource settings, run kubectl edit deploy velero -n qiming-backend, modify the settings under spec.template.spec.containers.resources, and save your changes.

Obtain the component logs

If the backup and recovery components do not run properly, please try to obtain the logs of relevant pods for troubleshooting.

  1. On the control plane of your cluster, run the following command to obtain the information of pods running in the qiming-backend namespace.

    $ kubectl get pods -n qiming-backend NAME READY STATUS RESTARTS AGE restic-98lc4 1/1 Running 0 80m velero-564c9df5c6-mn2s7 1/1 Running 0 80m velero-installer-66d65557c4-dz86t 1/1 Running 0 83m
  2. Run the following command to obtain the logs of Velero pods.

    kubectl logs <Velero pod ID> -n qiming-backend > /tmp/velero.log
  3. Run the following command to obtain the logs of the restic pods.

    kubectl logs <restic pod ID> -n qiming-backend > /tmp/restic.log

Obtain backup and recovery information and logs

If any backup job or recovery job is not running properly, please obtain the information and logs of the abnormal job for troubleshooting.

Prerequisites

  1. Make sure your machine can access the cluster with an error reported.

  2. Download the installation package for Velero v1.7.0 according to your environment from this page.

  3. Run the following command to unzip the installation package and view the version information.

    tar -xvf <installation package filename> cd <installation package directory> chmod +x velero ./velero version
  4. Run the following command to create a configuration file and edit it.

    mkdir -p $HOME/.config/velero vi $HOME/.config/velero/config.json
  5. Enter the following content in the configuration file to link the configuration file to the qiming-backend namespace. Save the changes when you finish.

    { "namespace": "qiming-backend" }
  6. Save the kubeconfig of the cluster for troubleshooting to ~/.kube/config. By default, Velero uses this kubeconfig to connect to the cluster. You can also use velero --kubeconfig=<your-kubeconfig-file> to specify the path of the kubeconfig.

Obtain information and logs of a backup job

  1. Run the following command to view the information of backup jobs.

    ./velero get backups
  2. Run the following command to view the details of backup jobs.

    ./velero describe backup <backup job name>
  3. Run the following command to view the logs of backup jobs.

    ./velero backup logs <backup job name>

Obtain information and logs of a recovery job

  1. Run the following command to view the information of recovery jobs.

    ./velero get restores
  2. Run the following command to view the details of recovery jobs.

    ./velero describe restore <recovery job name>
  3. Run the following command to view the logs of recovery jobs.

    ./velero restore logs <recovery job name>

Troubleshooting for components installation

  • A pod cannot start running in the qiming-backend namespace.
  1. Run the following command to view the causes of start failure.

    kubectl -n qiming-backend describe pod <pod name>
  2. If the output shows error in pulling images, run the following command to check whether the current node or cluster can pull images.

    docker pull registry.cn-shanghai.aliyuncs.com/jibudata/velero-installer:xxx
  3. If the output shows insufficient resources in the cluster, please ensure your cluster has sufficient resources.

  • restic DaemonSet errors.

    1. Run the following command to view the causes of errors.

      kubectl -n qiming-backend describe pod <restic pod name>
    2. The restic DaemonSet accesses the PVC data for backup by mounting the kubelet directory /var/lib/kubelet/pods. If the kubelet path of your Kubernetes cluster is different from that path, run the following command to change the value of spec.template.spec.volumes.hostPath.path to the correct path and save the changes.

      kubectl edit daemonset restic -n qiming-backend

Backup troubleshooting

Backup job remains in progress

  1. Run the following command to view the information about backup jobs.

    ./velero get backups
  2. Run the following command to view the details of the backup jobs remaining in progress and do troubleshooting according to the details.

    ./velero describe backup <backup job name> --details

Backup job failure

  1. Run the following command to view the information about backup jobs.

    ./velero get backups
  2. Run the following command to view the details of the failed backup jobs and do troubleshooting according to the output in Errors.

    ./velero describe backup <backup job name> --details

View backup/recovery CR information

View CRs of a backup job

  1. Run the following command to view all the CRs of the backup jobs.

    $ kubectl get backups.velero.io -n qiming-backend NAME AGE test-backup-4lx3w-vfg5q 2d19h

    common:NOTE

    The prefix of the backup CR name is the backup job name created by the system. For example, if you create a backup plan named test-backup1, then the corresponding backup job CR name is test-backup1-xxxxx-xxxxx.

  2. Run the following command to view the status of the backup job CR.

    kubectl get backups.velero.io -n qiming-backend <backup job CR name> -o yaml

    common:NOTE

    Pay attention to the status.phase in the output to look for any error or warning information.

View CRs of a recovery job

  1. Run the following command to view all the CRs of the recovery jobs.

    kubectl get restores.velero.io -n qiming-backend
  2. Run the following command to view the status of the CR of a recovery job.

    kubectl get restores.velero.io -n qiming-backend <restore cr name> -o yaml

KubeSphere ®️ © QingCloud Technologies 2022