automation-suite
2023.10
false
UiPath logo, featuring letters U and I in white

Automation Suite on Linux installation guide

Last updated Sep 24, 2025

Storage troubleshooting

Failure to compact metrics due to corrupted blocks in Thanos

Description

The Thanos compactor may fail to compact metrics when corrupted blocks are detected in the object store. This condition prevents the compactor from processing metrics, leading to increased storage use in the Ceph bucket.

Solution

To address the issue, take the following steps:
  1. On any server node, run the following script:
    thanosns=monitoring && if kubectl get application -n argocd rancher-monitoring; then thanosns=cattle-monitoring-system; fi && cat <<EOF | kubectl apply -f -
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      annotations:
      labels:
        app.kubernetes.io/component: thanos-cleaner
        app.kubernetes.io/instance: thanos-block-cleaner
        app.kubernetes.io/name: thanos-block-cleaner
      name: thanos-cleaner-role
      namespace: ${thanosns}
    rules:
    - apiGroups:
      - apps
      resources:
      - statefulsets
      - statefulsets/scale
      verbs:
      - list
      - get
      - update
      - patch
    - apiGroups:
      - batch
      resources:
      - jobs
      - cronjobs
      verbs:
      - delete
      - list
      - get
      - update
      - create
      - watch
    - apiGroups:
      - ""
      resources:
      - pods
      verbs:
      - delete
      - list
      - get
      - update
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      labels:
        app.kubernetes.io/component: thanos-cleaner
        app.kubernetes.io/instance: thanos-block-cleaner
        app.kubernetes.io/name: thanos-block-cleaner
      name: thanos-cleaner-role-binding
      namespace: ${thanosns}
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: thanos-cleaner-role
    subjects:
    - kind: ServiceAccount
      name: thanos-cleaner
      namespace: ${thanosns}
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: thanos-cleaner
      namespace: ${thanosns}
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: thanos-cleaner
      namespace: uipath
    spec:
      groups:
      - name: thanos
        rules:
        - alert: ThanosCompactorNotWorking
          annotations:
            description: Thanos compactor is not working. This will disable metrics compaction
              in objectstore bucket. Please check thanos compact pod in ${thanosns} namespace
              for any error. Compactor in faulty state will exhaust object store space
            message: Thanos compactor is not working. Please check if thanos cleaner job
              is functional and able to fix corruption
            runbook_url: https://docs.uipath.com/automation-suite/docs/alert-runbooks
            summary: Thanos compactor is not working
          expr: thanos_compactor_issue{job="thanos-cleaner"} >= 1
          for: 1d
          labels:
            app: thanos
            severity: critical
    ---
    EOFthanosns=monitoring && if kubectl get application -n argocd rancher-monitoring; then thanosns=cattle-monitoring-system; fi && cat <<EOF | kubectl apply -f -
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      annotations:
      labels:
        app.kubernetes.io/component: thanos-cleaner
        app.kubernetes.io/instance: thanos-block-cleaner
        app.kubernetes.io/name: thanos-block-cleaner
      name: thanos-cleaner-role
      namespace: ${thanosns}
    rules:
    - apiGroups:
      - apps
      resources:
      - statefulsets
      - statefulsets/scale
      verbs:
      - list
      - get
      - update
      - patch
    - apiGroups:
      - batch
      resources:
      - jobs
      - cronjobs
      verbs:
      - delete
      - list
      - get
      - update
      - create
      - watch
    - apiGroups:
      - ""
      resources:
      - pods
      verbs:
      - delete
      - list
      - get
      - update
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      labels:
        app.kubernetes.io/component: thanos-cleaner
        app.kubernetes.io/instance: thanos-block-cleaner
        app.kubernetes.io/name: thanos-block-cleaner
      name: thanos-cleaner-role-binding
      namespace: ${thanosns}
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: thanos-cleaner-role
    subjects:
    - kind: ServiceAccount
      name: thanos-cleaner
      namespace: ${thanosns}
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: thanos-cleaner
      namespace: ${thanosns}
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: thanos-cleaner
      namespace: uipath
    spec:
      groups:
      - name: thanos
        rules:
        - alert: ThanosCompactorNotWorking
          annotations:
            description: Thanos compactor is not working. This will disable metrics compaction
              in objectstore bucket. Please check thanos compact pod in ${thanosns} namespace
              for any error. Compactor in faulty state will exhaust object store space
            message: Thanos compactor is not working. Please check if thanos cleaner job
              is functional and able to fix corruption
            runbook_url: https://docs.uipath.com/automation-suite/docs/alert-runbooks
            summary: Thanos compactor is not working
          expr: thanos_compactor_issue{job="thanos-cleaner"} >= 1
          for: 1d
          labels:
            app: thanos
            severity: critical
    ---
    EOF
  2. On any server node, run the following script:
    cat <<'EOF' | kubectl apply -f -
    ---
    apiVersion: v1
    data:
      thanos-cleanup.sh: |
        #!/bin/bash
    
        # Copyright UiPath 2021
        #
        # =================
        # LICENSE AGREEMENT
        # -----------------
        #   Use of paid UiPath products and services is subject to the licensing agreement
        #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
        #   UiPath products is subject to the associated licensing agreement available here:
        #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
        #   You must not use this file separately from the product it is a part of or is associated with.
    
        set -eu -o pipefail
    
        export PATH=$PATH:/thanos-bin/
        # Below script removes the blocks which are overlapping or having index issue or having duplicated compaction
        #
        # In few cases with above mentioned scenarios, thanos may skip the compaction and halt the compaction module.
        # Compaction halt requires manual deletion of corrupted blocks and restart of compact pod.
    
        config_file=/etc/thanos/${THANOS_CONFIG_KEY}
    
        function info() {
          echo "[INFO] [$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*"
        }
    
        function warn() {
          echo -e "\e[0;33m[WARN] [$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error_without_exit() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
          exit 1
        }
    
        function is_compaction_halted() {
          info "Checking if thanos compactor running"
    
          IFS=" " read -r -a compactor_addresses <<<"$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=thanos-compact -o jsonpath="{.items[*].status.podIP}")"
    
          is_compactor_halted=0
    
          if [[ "${#compactor_addresses[@]}" -eq 0 ]]; then
            info "Thanos compactor pod is not running"
            is_compactor_halted=1
          fi
    
          for ip in "${compactor_addresses[@]}"; do
            #shellcheck disable=SC2086
            halted=$(curl -s http://${ip}:10902/metrics | grep thanos_compact_halted | grep -v '#' | awk -F ' ' '{print $2}')
            if [[ "$halted" -eq "1" ]]; then
              warn "Compaction is halted"
              is_compactor_halted=1
              break
            fi
          done
    
          return $is_compactor_halted
        }
    
        function execute_thanos_issue_command() {
          if [[ $# -ne 1 ]]; then
            error "missing issue name for execute_thanos_issue_command function"
          fi
    
          issue=$1
    
          info "Checking for issue $issue"
          cmd_ret=0
          #shellcheck disable=SC2086
          verify_output=$(thanos tools bucket --objstore.config-file=${config_file} verify --log.format=json -i $issue 2>&1) && true || cmd_ret=1
          if [[ $cmd_ret -eq 1 ]]; then
            error_without_exit "Output of $issue command: -> $verify_output"
            error "Failed to verify bucket for $issue"
          fi
    
          #shellcheck disable=SC2086
          echo $verify_output
        }
    
        function fix_index_issue() {
          info "Fixing index_known_issue issue"
    
          verify_output=$(execute_thanos_issue_command "index_known_issues")
          #shellcheck disable=SC2086
          for b in $(echo $verify_output | sed 's/} {/\r\n/g' | grep err | grep "detected issue" | awk -F '"id":' '{print $2}' | awk -F ',' '{print $1}' | tr -d '"'); do
            info "Block=$b is having the issue, removing it.."
    
            thanos tools bucket mark --id="$b" \
              --marker=deletion-mark.json \
              --details="deleted by job" \
              --objstore.config-file="${config_file}"
    
            info "Block=$b is marked for deletion"
          done
    
          info "Fixing index_known_issue issue done"
        }
    
        function fix_overlapping_issue() {
          info "Fixing overlapped_blocks issue"
    
          overlap_output=$(execute_thanos_issue_command "overlapped_blocks")
    
          while IFS= read -r line; do
            #shellcheck disable=SC2086
            for b in $(echo $line | awk -F '"overlap":' '{print $2}' | awk -v search="ulid" 'match($0, search) {print substr($0, RSTART)}' | sed 's/ulid/\r\nulid/g' | awk -F ',' '{print $1}' | grep '^ulid' | awk -F ': ' '{print $2}'); do
              info "Block=$b is having the issue, removing it.."
              thanos tools bucket mark --id="$b" \
                --marker=deletion-mark.json \
                --details="deleted by job" \
                --objstore.config-file="${config_file}"
    
              info "Block=$b is marked for deletion"
            done
          done < <(echo "$overlap_output" | sed 's/} {/\r\n/g' | grep "found overlapped blocks")
    
          info "Fixing overlapped_blocks issue done"
        }
    
        function fix_duplicate_issue() {
          info "Fixing duplicated_compaction issue"
          duplicate_output=$(execute_thanos_issue_command "duplicated_compaction")
          #shellcheck disable=SC2086,SC2006
          for b in $(echo $duplicate_output | sed 's/ts=2/\r\n2/g' | grep "Found duplicated blocks that are ok to be removed" | awk -F 'ULIDs="' '{print $2}' | tr -d '[]' | awk -F '"' '{print $1}'); do
            info "Block=$b is having the issue, removing it.."
            thanos tools bucket mark --id="$b" \
              --marker=deletion-mark.json \
              --details="deleted by job" \
              --objstore.config-file="${config_file}"
    
            info "Block=$b is marked for deletion"
          done
    
          info "Fixing duplicated_compaction issue done"
        }
    
        if [[ -z "$NAMESPACE" ]]; then
          error "NAMESPACE is not set"
        fi
    
        # We will check if compaction is halted or not before checking for issues
        if is_compaction_halted; then
          info "Thanos compaction is working"
          echo "thanos_compactor_issue 0" | curl --data-binary @- "http://pushgateway-prometheus-pushgateway.uipath.svc.cluster.local:9091/metrics/job/thanos-cleaner"
          exit 0
        fi
    
        warn "Thanos compactor is not working. Checking for corrupted blocks..."
        echo "thanos_compactor_issue 1" | curl --data-binary @- "http://pushgateway-prometheus-pushgateway.uipath.svc.cluster.local:9091/metrics/job/thanos-cleaner"
    
        if [[ "$DISABLE_BLOCK_CLEANER" == true ]]; then
          info "DISABLE_BLOCK_CLEANER is set to $DISABLE_BLOCK_CLEANER, skipping block clean"
          exit 0
        fi
    
        info "DISABLE_BLOCK_CLEANER is set to $DISABLE_BLOCK_CLEANER, removing corrupted blocks"
    
        replica=$(kubectl get sts -n "$NAMESPACE" thanos-compact -o jsonpath='{.spec.replicas}')
    
        # compactor must not be running while deleting blocks
    
        info "Stopping compactor"
        kubectl scale sts -n "$NAMESPACE" thanos-compact --replicas=0
        kubectl delete pods -n "$NAMESPACE" -l app.kubernetes.io/instance=thanos-compact --force
    
        # fixing index_known_issues
        info "Checking blocks having issue"
    
        fix_index_issue
        fix_overlapping_issue
        fix_duplicate_issue
    
        info "Triggering deletion of all marked blocks"
    
        #shellcheck disable=SC2086
        thanos tools bucket cleanup --delete-delay=0 --objstore.config-file=${config_file}
    
        info "Corrupted blocks are deleted"
    
        info "Scaling thanos compactor's replica to $replica"
        #shellcheck disable=SC2086
        kubectl scale sts -n "$NAMESPACE" thanos-compact --replicas=$replica
        info "Thanos compactor started"
      validate-cronjob.sh: |
        #!/bin/bash
    
        # Copyright UiPath 2021
        #
        # =================
        # LICENSE AGREEMENT
        # -----------------
        #   Use of paid UiPath products and services is subject to the licensing agreement
        #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
        #   UiPath products is subject to the associated licensing agreement available here:
        #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
        #   You must not use this file separately from the product it is a part of or is associated with.
    
        set -eu -o pipefail
    
        function info() {
          echo "[INFO] [$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*"
        }
    
        function warn() {
          echo -e "\e[0;33m[WARN] [$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error_without_exit() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
          exit 1
        }
    
        alias kubectl='kubectl --cache-dir=/tmp/'
        IFS="," read -ra cronjobs <<<"$CRONJOB_LIST"
    
        for cr in "${cronjobs[@]}"; do
          #shellcheck disable=SC2206
          name=(${cr//// })
          cronNs=default
          cronName=""
    
          if [[ ${#name[@]} -gt 2 || ${#name[@]} -lt 1 ]]; then
            error "Invalid cronjob name=$cr"
          fi
    
          if [[ ${#name[@]} -eq 2 ]]; then
            cronNs=${name[0]}
            cronName=${name[1]}
          else
            cronName=${name[0]}
          fi
    
          info "Validating cronjob=$cr"
    
          jobName="${cronName}-sf-job-validation"
    
          created=1
          info "Creating validation job for $cr"
          kubectl delete job -n "${cronNs}" "${jobName}" --ignore-not-found --timeout=3m
    
          #shellcheck disable=SC2086
          kubectl create job -n "${cronNs}" --from=cronjob/${cronName} "$jobName" || created=0
    
          if [[ $created == 0 ]]; then
            error "Failed to create job for $cr"
          fi
    
          #shellcheck disable=SC2086
          kubectl wait --timeout=20m --for=condition=complete -n "${cronNs}" job/$jobName &
          cpid=$!
    
          #shellcheck disable=SC2086
          kubectl wait --timeout=20m --for=condition=failed -n "${cronNs}" job/${jobName} && exit 1 &
          fpid=$!
    
          ret=0
          wait -n $cpid $fpid || ret=1
    
          kill -9 $cpid || true
          kill -9 $fpid || true
    
          if [[ $ret -eq 0 ]]; then
            info "Job for $cr is validated/completed"
            #ignore deletion error. if deletion fail then will get caught in next sync. This is to reduce failure during installation
            kubectl delete job -n "${cronNs}" "${jobName}" --timeout=3m || true
          else
            error "Job for $cr failed"
          fi
        done
    kind: ConfigMap
    metadata:
      name: thanos-cleaner-script
      namespace: cattle-monitoring-system
    ---
    EOFcat <<'EOF' | kubectl apply -f -
    ---
    apiVersion: v1
    data:
      thanos-cleanup.sh: |
        #!/bin/bash
    
        # Copyright UiPath 2021
        #
        # =================
        # LICENSE AGREEMENT
        # -----------------
        #   Use of paid UiPath products and services is subject to the licensing agreement
        #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
        #   UiPath products is subject to the associated licensing agreement available here:
        #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
        #   You must not use this file separately from the product it is a part of or is associated with.
    
        set -eu -o pipefail
    
        export PATH=$PATH:/thanos-bin/
        # Below script removes the blocks which are overlapping or having index issue or having duplicated compaction
        #
        # In few cases with above mentioned scenarios, thanos may skip the compaction and halt the compaction module.
        # Compaction halt requires manual deletion of corrupted blocks and restart of compact pod.
    
        config_file=/etc/thanos/${THANOS_CONFIG_KEY}
    
        function info() {
          echo "[INFO] [$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*"
        }
    
        function warn() {
          echo -e "\e[0;33m[WARN] [$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error_without_exit() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
          exit 1
        }
    
        function is_compaction_halted() {
          info "Checking if thanos compactor running"
    
          IFS=" " read -r -a compactor_addresses <<<"$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/instance=thanos-compact -o jsonpath="{.items[*].status.podIP}")"
    
          is_compactor_halted=0
    
          if [[ "${#compactor_addresses[@]}" -eq 0 ]]; then
            info "Thanos compactor pod is not running"
            is_compactor_halted=1
          fi
    
          for ip in "${compactor_addresses[@]}"; do
            #shellcheck disable=SC2086
            halted=$(curl -s http://${ip}:10902/metrics | grep thanos_compact_halted | grep -v '#' | awk -F ' ' '{print $2}')
            if [[ "$halted" -eq "1" ]]; then
              warn "Compaction is halted"
              is_compactor_halted=1
              break
            fi
          done
    
          return $is_compactor_halted
        }
    
        function execute_thanos_issue_command() {
          if [[ $# -ne 1 ]]; then
            error "missing issue name for execute_thanos_issue_command function"
          fi
    
          issue=$1
    
          info "Checking for issue $issue"
          cmd_ret=0
          #shellcheck disable=SC2086
          verify_output=$(thanos tools bucket --objstore.config-file=${config_file} verify --log.format=json -i $issue 2>&1) && true || cmd_ret=1
          if [[ $cmd_ret -eq 1 ]]; then
            error_without_exit "Output of $issue command: -> $verify_output"
            error "Failed to verify bucket for $issue"
          fi
    
          #shellcheck disable=SC2086
          echo $verify_output
        }
    
        function fix_index_issue() {
          info "Fixing index_known_issue issue"
    
          verify_output=$(execute_thanos_issue_command "index_known_issues")
          #shellcheck disable=SC2086
          for b in $(echo $verify_output | sed 's/} {/\r\n/g' | grep err | grep "detected issue" | awk -F '"id":' '{print $2}' | awk -F ',' '{print $1}' | tr -d '"'); do
            info "Block=$b is having the issue, removing it.."
    
            thanos tools bucket mark --id="$b" \
              --marker=deletion-mark.json \
              --details="deleted by job" \
              --objstore.config-file="${config_file}"
    
            info "Block=$b is marked for deletion"
          done
    
          info "Fixing index_known_issue issue done"
        }
    
        function fix_overlapping_issue() {
          info "Fixing overlapped_blocks issue"
    
          overlap_output=$(execute_thanos_issue_command "overlapped_blocks")
    
          while IFS= read -r line; do
            #shellcheck disable=SC2086
            for b in $(echo $line | awk -F '"overlap":' '{print $2}' | awk -v search="ulid" 'match($0, search) {print substr($0, RSTART)}' | sed 's/ulid/\r\nulid/g' | awk -F ',' '{print $1}' | grep '^ulid' | awk -F ': ' '{print $2}'); do
              info "Block=$b is having the issue, removing it.."
              thanos tools bucket mark --id="$b" \
                --marker=deletion-mark.json \
                --details="deleted by job" \
                --objstore.config-file="${config_file}"
    
              info "Block=$b is marked for deletion"
            done
          done < <(echo "$overlap_output" | sed 's/} {/\r\n/g' | grep "found overlapped blocks")
    
          info "Fixing overlapped_blocks issue done"
        }
    
        function fix_duplicate_issue() {
          info "Fixing duplicated_compaction issue"
          duplicate_output=$(execute_thanos_issue_command "duplicated_compaction")
          #shellcheck disable=SC2086,SC2006
          for b in $(echo $duplicate_output | sed 's/ts=2/\r\n2/g' | grep "Found duplicated blocks that are ok to be removed" | awk -F 'ULIDs="' '{print $2}' | tr -d '[]' | awk -F '"' '{print $1}'); do
            info "Block=$b is having the issue, removing it.."
            thanos tools bucket mark --id="$b" \
              --marker=deletion-mark.json \
              --details="deleted by job" \
              --objstore.config-file="${config_file}"
    
            info "Block=$b is marked for deletion"
          done
    
          info "Fixing duplicated_compaction issue done"
        }
    
        if [[ -z "$NAMESPACE" ]]; then
          error "NAMESPACE is not set"
        fi
    
        # We will check if compaction is halted or not before checking for issues
        if is_compaction_halted; then
          info "Thanos compaction is working"
          echo "thanos_compactor_issue 0" | curl --data-binary @- "http://pushgateway-prometheus-pushgateway.uipath.svc.cluster.local:9091/metrics/job/thanos-cleaner"
          exit 0
        fi
    
        warn "Thanos compactor is not working. Checking for corrupted blocks..."
        echo "thanos_compactor_issue 1" | curl --data-binary @- "http://pushgateway-prometheus-pushgateway.uipath.svc.cluster.local:9091/metrics/job/thanos-cleaner"
    
        if [[ "$DISABLE_BLOCK_CLEANER" == true ]]; then
          info "DISABLE_BLOCK_CLEANER is set to $DISABLE_BLOCK_CLEANER, skipping block clean"
          exit 0
        fi
    
        info "DISABLE_BLOCK_CLEANER is set to $DISABLE_BLOCK_CLEANER, removing corrupted blocks"
    
        replica=$(kubectl get sts -n "$NAMESPACE" thanos-compact -o jsonpath='{.spec.replicas}')
    
        # compactor must not be running while deleting blocks
    
        info "Stopping compactor"
        kubectl scale sts -n "$NAMESPACE" thanos-compact --replicas=0
        kubectl delete pods -n "$NAMESPACE" -l app.kubernetes.io/instance=thanos-compact --force
    
        # fixing index_known_issues
        info "Checking blocks having issue"
    
        fix_index_issue
        fix_overlapping_issue
        fix_duplicate_issue
    
        info "Triggering deletion of all marked blocks"
    
        #shellcheck disable=SC2086
        thanos tools bucket cleanup --delete-delay=0 --objstore.config-file=${config_file}
    
        info "Corrupted blocks are deleted"
    
        info "Scaling thanos compactor's replica to $replica"
        #shellcheck disable=SC2086
        kubectl scale sts -n "$NAMESPACE" thanos-compact --replicas=$replica
        info "Thanos compactor started"
      validate-cronjob.sh: |
        #!/bin/bash
    
        # Copyright UiPath 2021
        #
        # =================
        # LICENSE AGREEMENT
        # -----------------
        #   Use of paid UiPath products and services is subject to the licensing agreement
        #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
        #   UiPath products is subject to the associated licensing agreement available here:
        #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
        #   You must not use this file separately from the product it is a part of or is associated with.
    
        set -eu -o pipefail
    
        function info() {
          echo "[INFO] [$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*"
        }
    
        function warn() {
          echo -e "\e[0;33m[WARN] [$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error_without_exit() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
        }
    
        function error() {
          echo -e "\e[0;31m[ERROR][$(date +'%Y-%m-%dT%H:%M:%S%z')]:\e[0m $*" >&2
          exit 1
        }
    
        alias kubectl='kubectl --cache-dir=/tmp/'
        IFS="," read -ra cronjobs <<<"$CRONJOB_LIST"
    
        for cr in "${cronjobs[@]}"; do
          #shellcheck disable=SC2206
          name=(${cr//// })
          cronNs=default
          cronName=""
    
          if [[ ${#name[@]} -gt 2 || ${#name[@]} -lt 1 ]]; then
            error "Invalid cronjob name=$cr"
          fi
    
          if [[ ${#name[@]} -eq 2 ]]; then
            cronNs=${name[0]}
            cronName=${name[1]}
          else
            cronName=${name[0]}
          fi
    
          info "Validating cronjob=$cr"
    
          jobName="${cronName}-sf-job-validation"
    
          created=1
          info "Creating validation job for $cr"
          kubectl delete job -n "${cronNs}" "${jobName}" --ignore-not-found --timeout=3m
    
          #shellcheck disable=SC2086
          kubectl create job -n "${cronNs}" --from=cronjob/${cronName} "$jobName" || created=0
    
          if [[ $created == 0 ]]; then
            error "Failed to create job for $cr"
          fi
    
          #shellcheck disable=SC2086
          kubectl wait --timeout=20m --for=condition=complete -n "${cronNs}" job/$jobName &
          cpid=$!
    
          #shellcheck disable=SC2086
          kubectl wait --timeout=20m --for=condition=failed -n "${cronNs}" job/${jobName} && exit 1 &
          fpid=$!
    
          ret=0
          wait -n $cpid $fpid || ret=1
    
          kill -9 $cpid || true
          kill -9 $fpid || true
    
          if [[ $ret -eq 0 ]]; then
            info "Job for $cr is validated/completed"
            #ignore deletion error. if deletion fail then will get caught in next sync. This is to reduce failure during installation
            kubectl delete job -n "${cronNs}" "${jobName}" --timeout=3m || true
          else
            error "Job for $cr failed"
          fi
        done
    kind: ConfigMap
    metadata:
      name: thanos-cleaner-script
      namespace: cattle-monitoring-system
    ---
    EOF
  3. Replace SF_K8S_TAG with the correct image tag, then apply the cronjob.

    From the installer directory on any server node, get the latest tag:

    cat versions/docker-images.json  |grep uipath/sf-k8-utils-rhel | tr -d ',"' | awk -F ':' '{print $2}' |sort |uniq |tail -1cat versions/docker-images.json  |grep uipath/sf-k8-utils-rhel | tr -d ',"' | awk -F ':' '{print $2}' |sort |uniq |tail -1
    
    Then update the cronjob block by replacing SF_K8S_TAG with the returned value.

    Once updated, paste the entire block in the terminal of any server node:

    thanosns=monitoring && if kubectl get application -n argocd rancher-monitoring; then thanosns=cattle-monitoring-system; fi && thanosimage=$(kubectl  get statefulset -n $thanosns thanos-compact -o jsonpath='{.spec.template.spec.containers[0].image}') &&  cat <<EOF | kubectl apply -f -
    ---
    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: thanos-cleaner
      namespace: ${thanosns}
    spec:
      concurrencyPolicy: Forbid
      failedJobsHistoryLimit: 3
      jobTemplate:
        metadata:
          creationTimestamp: null
        spec:
          backoffLimit: 3
          template:
            metadata:
              annotations:
                sidecar.istio.io/inject: "false"
              creationTimestamp: null
              labels:
                app.kubernetes.io/name: thanos-cleaner-cronjob
            spec:
              containers:
              - args:
                - /script/thanos-cleanup.sh
                command:
                - /bin/bash
                env:
                - name: NAMESPACE
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.namespace
                - name: THANOS_CONFIG_KEY
                  value: thanos.yaml
                - name: DISABLE_BLOCK_CLEANER
                  value: "false"
                image: docker.io/uipath/sf-k8-utils-rhel:SF_K8S_TAG
                imagePullPolicy: IfNotPresent
                name: thanos-cleaner
                resources:
                  limits:
                    cpu: 200m
                    memory: 400Mi
                  requests:
                    cpu: 20m
                    memory: 64Mi
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                - mountPath: /script/
                  name: script
                - mountPath: /etc/thanos/
                  name: thanos-objectstore-vol
                - mountPath: /thanos-bin/
                  name: thanos
                - mountPath: /.kube/
                  name: kubedir
                - mountPath: /tmp/
                  name: tmpdir
              dnsPolicy: ClusterFirst
              initContainers:
              - args:
                - set -e; cp /bin/thanos /thanos-bin/thanos && chmod +x /thanos-bin/thanos
                command:
                - /bin/sh
                - -c
                image: ${thanosimage}
                imagePullPolicy: IfNotPresent
                name: copy-uipathcore-binary
                resources: {}
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                - mountPath: /thanos-bin/
                  name: thanos
              nodeSelector:
                kubernetes.io/os: linux
              restartPolicy: Never
              schedulerName: default-scheduler
              securityContext:
                fsGroup: 3000
                runAsGroup: 2000
                runAsNonRoot: true
                runAsUser: 1000
              serviceAccount: thanos-cleaner
              serviceAccountName: thanos-cleaner
              terminationGracePeriodSeconds: 120
              volumes:
              - emptyDir: {}
                name: kubedir
              - emptyDir: {}
                name: tmpdir
              - emptyDir: {}
                name: thanos
              - name: thanos-objectstore-vol
                secret:
                  defaultMode: 420
                  secretName: thanos-objectstore-config
              - configMap:
                  defaultMode: 420
                  name: thanos-cleaner-script
                name: script
      schedule: 0 1/6 * * *
      successfulJobsHistoryLimit: 2
      suspend: false
    ---
    EOFthanosns=monitoring && if kubectl get application -n argocd rancher-monitoring; then thanosns=cattle-monitoring-system; fi && thanosimage=$(kubectl  get statefulset -n $thanosns thanos-compact -o jsonpath='{.spec.template.spec.containers[0].image}') &&  cat <<EOF | kubectl apply -f -
    ---
    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: thanos-cleaner
      namespace: ${thanosns}
    spec:
      concurrencyPolicy: Forbid
      failedJobsHistoryLimit: 3
      jobTemplate:
        metadata:
          creationTimestamp: null
        spec:
          backoffLimit: 3
          template:
            metadata:
              annotations:
                sidecar.istio.io/inject: "false"
              creationTimestamp: null
              labels:
                app.kubernetes.io/name: thanos-cleaner-cronjob
            spec:
              containers:
              - args:
                - /script/thanos-cleanup.sh
                command:
                - /bin/bash
                env:
                - name: NAMESPACE
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.namespace
                - name: THANOS_CONFIG_KEY
                  value: thanos.yaml
                - name: DISABLE_BLOCK_CLEANER
                  value: "false"
                image: docker.io/uipath/sf-k8-utils-rhel:SF_K8S_TAG
                imagePullPolicy: IfNotPresent
                name: thanos-cleaner
                resources:
                  limits:
                    cpu: 200m
                    memory: 400Mi
                  requests:
                    cpu: 20m
                    memory: 64Mi
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                - mountPath: /script/
                  name: script
                - mountPath: /etc/thanos/
                  name: thanos-objectstore-vol
                - mountPath: /thanos-bin/
                  name: thanos
                - mountPath: /.kube/
                  name: kubedir
                - mountPath: /tmp/
                  name: tmpdir
              dnsPolicy: ClusterFirst
              initContainers:
              - args:
                - set -e; cp /bin/thanos /thanos-bin/thanos && chmod +x /thanos-bin/thanos
                command:
                - /bin/sh
                - -c
                image: ${thanosimage}
                imagePullPolicy: IfNotPresent
                name: copy-uipathcore-binary
                resources: {}
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                - mountPath: /thanos-bin/
                  name: thanos
              nodeSelector:
                kubernetes.io/os: linux
              restartPolicy: Never
              schedulerName: default-scheduler
              securityContext:
                fsGroup: 3000
                runAsGroup: 2000
                runAsNonRoot: true
                runAsUser: 1000
              serviceAccount: thanos-cleaner
              serviceAccountName: thanos-cleaner
              terminationGracePeriodSeconds: 120
              volumes:
              - emptyDir: {}
                name: kubedir
              - emptyDir: {}
                name: tmpdir
              - emptyDir: {}
                name: thanos
              - name: thanos-objectstore-vol
                secret:
                  defaultMode: 420
                  secretName: thanos-objectstore-config
              - configMap:
                  defaultMode: 420
                  name: thanos-cleaner-script
                name: script
      schedule: 0 1/6 * * *
      successfulJobsHistoryLimit: 2
      suspend: false
    ---
    EOF
  • Failure to compact metrics due to corrupted blocks in Thanos
  • Description
  • Solution

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo
Trust and Security
© 2005-2025 UiPath. All rights reserved.