automation-suite
2022.4
false
- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Managing products
- Managing the cluster in ArgoCD
- Setting up the external NFS server
- Automated: Enabling the Backup on the Cluster
- Automated: Disabling the Backup on the Cluster
- Automated, Online: Restoring the Cluster
- Automated, Offline: Restoring the Cluster
- Manual: Enabling the Backup on the Cluster
- Manual: Disabling the Backup on the Cluster
- Manual, Online: Restoring the Cluster
- Manual, Offline: Restoring the Cluster
- Additional configuration
- Migrating objectstore from persistent volume to raw disks
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure After Certificate Update
- Automation Suite not working after OS upgrade
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Volume unable to mount due to not being ready for workloads
- RKE2 fails during installation and upgrade
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to Resize Objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Cannot Log in After Migration
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- After the Initial Install, ArgoCD App Went Into Progressing State
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unexpected Inconsistency; Run Fsck Manually
- Degraded MongoDB or Business Applications After Cluster Restore
- Missing Self-heal-operator and Sf-k8-utils Repo
- Unhealthy Services After Cluster Restore or Rollback
- RabbitMQ pod stuck in CrashLoopBackOff
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Pods cannot communicate with FQDN in a proxy environment
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite support bundle
- Exploring Logs

Automation Suite installation guide
Last updated Mar 17, 2025
How to manually clean up logs
linkCleaning up Ceph logs
linkMoving Ceph out of read-only mode
If you installed AI Center and use Ceph storage, take the following steps to move Ceph out of read-only mode:
- Check if Ceph is at full capacity:If Ceph is at full capacity, you must adjust the read-only threshold to start up the rgw gateways.
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status - Scale down the ML skills:
kubectl -n uipath scale deployment <skill> --replicas=0
kubectl -n uipath scale deployment <skill> --replicas=0 - Put the cluster in write mode:
ceph osd set-full-ratio 0.95 <95 is the default value so you could increase to 96 and go up incrementall>
ceph osd set-full-ratio 0.95 <95 is the default value so you could increase to 96 and go up incrementall> - Run Garbage Collection:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- radosgw-admin gc process --include-all
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- radosgw-admin gc process --include-all - When storage goes down, run the following commands:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph dfAt this point, the storage should be lower, and the cluster should be healthy.
Disabling streaming logs
To ensure everything is in a good state, disable streaming logs by taking the following steps.
- Disable aut-sync on UiPath and AI Center.
- Disable streaming logs for AI Center.
- If you have ML skills that have already been deployed, run the following commands:
kubectl set env deployment [REPLICASET_NAME] LOGS_STREAMING_ENABLED=false
kubectl set env deployment [REPLICASET_NAME] LOGS_STREAMING_ENABLED=false - Find out which buckets use the most space:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin bucket stats | jq -r '["BucketName","NoOfObjects","SizeInKB"], ["--------------------","------","------"], (.[] | [.bucket, .usage."rgw.main"."num_objects", .usage."rgw.main".size_kb_actual]) | @tsv' | column -ts $'\t'
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin bucket stats | jq -r '["BucketName","NoOfObjects","SizeInKB"], ["--------------------","------","------"], (.[] | [.bucket, .usage."rgw.main"."num_objects", .usage."rgw.main".size_kb_actual]) | @tsv' | column -ts $'\t' - Install s3cmd to prepare for cleaning up the
sf-logs
:pip3 install awscli s3cmd export PATH=/usr/local/bin:$PATH
pip3 install awscli s3cmd export PATH=/usr/local/bin:$PATH - Clean up the
sf-logs
logs. For details, see How to clean up old logs stored in the sf-logs bundle. - Complete the cleanup operation:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- radosgw-admin gc process --include-all
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- radosgw-admin gc process --include-all - If the previous steps do not solve the issue, clean up the AI Center data.
- Check if the storage was reduced:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df - Once storage is no longer full, reduce the backfill setting:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd set-full-ratio 0.95
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd set-full-ratio 0.95 - Check if the ML skills are affected by the multipart upload issue:
echo $(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin bucket list --max-entries 10000000 --bucket train-data | jq '[.[] | select (.name | contains("_multipart")) | .meta.size] | add') | numfmt --to=iec-i
echo $(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin bucket list --max-entries 10000000 --bucket train-data | jq '[.[] | select (.name | contains("_multipart")) | .meta.size] | add') | numfmt --to=iec-iIf they are affected by this issue, and the returned value is high, you may need to do a backup and restore.