- Overview
- Requirements
- Installation
- Q&A: Deployment templates
- Downloading the installation packages
- Install-uipath.sh Parameters
- Enabling Redis High Availability Add-On for the cluster
- Document Understanding configuration file
- Adding a dedicated agent node with GPU support
- Connecting Task Mining application
- Adding a dedicated agent Node for Task Mining
- Adding a Dedicated Agent Node for Automation Suite Robots
- Post-installation
- Cluster administration
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to troubleshoot services during installation
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- Volume unable to mount due to not being ready for workloads
- Unable to launch Automation Hub and Apps with proxy setup
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to resize PVC
- Failure to resize objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- Unexpected inconsistency; run fsck manually
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- MongoDB Pod Fails to Upgrade From 4.4.4-ent to 5.0.7-ent
- Unhealthy services after cluster restore or rollback
- Pods stuck in Init:0/X
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite support bundle
- Exploring Logs
Using the Automation Suite Diagnostics Tool
The Automation Suite Diagnostics Tool is the first thing to use when facing any issues with Automation Suite. It checks the health of different required components and gives a consolidated report.
You can get the Automation Suite Diagnostics Tool in the following ways:
- By unzipping the sf-installer.zip installer package.
- By downloading the supportability-tools.zip
Before running the Automation Suite Diagnostics Tool, navigate to the installer folder. You may find the installer in the following location or anywhere you downloaded it:
cd /opt/UiPathAutomationSuite/{version}/installer
cd /opt/UiPathAutomationSuite/{version}/installer
To start using the Automation Suite Diagnostics Tool, run the following command:
./Support-Tools/diagnostics-tool/diagnostics-report.sh
./Support-Tools/diagnostics-tool/diagnostics-report.sh
The following table lists the checks the Automation Suite Diagnostics Tool performs. Note that you can run the script on any of the nodes in the cluster as well as externally.
Node |
Checks |
---|---|
Master node |
|
Agent node |
|
External machine |
Note: To run the script from an external machine, first set the proper
kubeconfig context to the cluster, and then pass the -e flag to the script bash diagnostics-report.sh -e .
|
Sample report generated by the Automation Suite Diagnostics Tool.
INFO logs in green show that the required checks passed. However, you should still properly check the disk/memory usage to avoid hidden errors.
Even though these messages do not signal a high risk, you might have to rectify them, as they might be affecting some services in certain scenarios.
You must fix the issues described by these messages as they impact some service in the cluster.
If these services are down, it means the node is down. Try restarting the service using systemctl restart <service-name> as this should fix the issue.
/var/lib
as Kubernetes uses it to store its data. If the directory is full, various issues might arise. To prevent these problems,
make sure to increase its size.
For all the nodes, we specify if they are under Disk Pressure or Memory Pressure. If that happens, workloads on these nodes might start showing issues. Check if there are any other processes running on these nodes that are consuming resources and remove them if that is the case.
We use Ceph as S3 Object storage for storing logs and files from different applications. You can view the status of its services. If they are down, you might have to restart them. Make sure to also check if the disk usage by Ceph is full.
443
and 31443
to be open with the hostname that was provided. The report indicates if they are not accessible. Make sure to open the appropriate
ports if pointed here.
The tool checks if the uploaded certificate is valid for the given hostname and if it has not expired. If the certificate does not meet these criteria, errors occur. To prevent this, make sure to check your uploaded certificate and change it if required.
Since some services require GPU to be present on some of the nodes in the cluster, the Automation Suite Diagnostics Tool checks if there is are GPU nodes and prints number of such nodes. If you are expecting GPU nodes to be present and they do not show up here, that means something went wrong in GPU setup.
MongoDB is an important component that the UiPath Apps service uses. If either MongoDB or its primary instance is down, you need to investigate the issue using the support bundle.
RabbitMQ and DockerRegistry are two important components that some services use. If any of them is down, you need to investigate the issue and a restart.
ArgoCD is our application lifecycle management (ALM) tool. If any of its services are down, then other applications may become outdated or have other issues. Recovering these services is important, and might need further debugging.
The Automation Suite Diagnostics Tool shows whether ArgoCD applications are missing and degraded.
- If applications are missing, go to the ArgoCD UI and sync it.
- If applications are degraded, additional debugging is needed to investigate the errors thrown by ArgoCD
- Overview
- Reading Diagnostics Reports
- INFO Logs
- WARN Messages
- ERROR Messages
- Rke2-server or Rke2-agent Service Down
- Directory Size Mounted at /var/lib
- Rke2 Version
- Disk Pressure or Memory Pressure
- Ceph Services Status
- Ports 443 and 31443
- Certificate Validity
- GPU
- MongoDB
- RabbitMQ and DockerRegistry
- ArgoCD Services Down
- Missing or Degraded ArgoCD Applications