Adding a Dedicated Agent Node With GPU Support


Automation Suite currently supports only Nvidia GPU drivers. See the list of GPU-supported operating systems.

For more on the cloud-specific instance types, see the following:

Before adding a dedicated agent node with GPU support, make sure to check Hardware requirements.

Installing a GPU driver on the machine

  • The following instructions apply to both online and offline Automation Suite installations. In the case of offline installations, you must ensure temporary internet access to retrieve the required GPU driver dependencies. If you encounter issues while installing the GPU driver, contact Nvidia support.

  • The GPU driver is stored under the /opt/nvidia and /usr folders. It is highly recommended that these folders should be at-least 5 GB and 15 GB, respectively, on the GPU agent machine.
  1. To install the GPU driver on the agent node, run the following command:
    sudo yum install kernel kernel-tools kernel-headers kernel-devel 
    sudo reboot
    sudo yum install
    sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel.repo
    sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel-modular.repo
    sudo yum config-manager --add-repo
  2. To install the container toolkits, run the following command:
    curl -s -L | \
            sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
            sudo yum-config-manager --enable nvidia-container-toolkit-experimental
Verify if the drivers are installed properly

Run sudo nvidia-smi command on the node to verify if the drivers were installed properly.

Note: Once the cluster has been provisioned, additional steps are required to configure the provisioned GPUs.

At this point, the GPU drivers have been installed and that the GPU nodes have been added to the cluster.

Adding a GPU node to the cluster

Step 1: Configuring the machine

Follow the steps for configuring the machine to ensure the disk is partitioned correctly and all networking requirements are met.

Step 2: Copying the interactive installer to the target machine

For online installation

  1. SSH to any of the server machine.
  2. Run the following command to copy the contents of the UiPathAutomationSuite folder to the GPU node (username and DNS are specific to the GPU node):
    sudo su -
    scp -r /opt/UiPathAutomationSuite <username>@<node dns>:/opt/
For Offline Installation

  1. SSH to any of the server node.
  2. Ensure that the /opt/UiPathAutomationSuite directory contains sf-infra.tar.gz file (it is part of the installation package download step )
    scp -r ~/opt/UiPathAutomationSuite <username>@<node dns>:/var/tmpscp -r ~/opt/UiPathAutomationSuite <username>@<node dns>:/var/tmp

Step 3: Running the Interactive Installation Wizard to Configure the Dedicated Node

For online installation

  1. SSH to the GPU Node.
  2. Run the following commands:
    sudo su -
    cd /opt/UiPathAutomationSuite
    chmod -R 755 /opt/UiPathAutomationSuite
    yum install unzip jq -y
For Offline Installation

  1. Connect via SSH to the GPU dedicated node.
  2. Install the platform bundle on the GPU dedicated node using the following script:
    sudo su 
    mv /var/tmp/UiPathAutomationSuite /opt
    cd /opt/UiPathAutomationSuite
    chmod -R 755 /opt/UiPathAutomationSuite
Configuring the GPU Driver on the Cluster

Step 1: Installing the GPU Driver on the Cluster

  1. Ensure you are SSH to GPU machine.
  2. Update the contianerd configuration of the GPU node by running the following commands:
    cat <<EOF >
    if ! nvidia-smi &>/dev/null;
      echo "GPU Drivers are not installed on the VM. Please refer the documentation."
      exit 0
    if ! which nvidia-container-runtime &>/dev/null;
      echo "Nvidia container runtime is not installed on the VM. Please refer the documentation."
      exit 0 
    grep "nvidia-container-runtime" /var/lib/rancher/rke2/agent/etc/containerd/config.toml &>/dev/null && info "GPU containerd changes already applied" && exit 0
    awk '1;/plugins.cri.containerd]/{print "  default_runtime_name = \"nvidia-container-runtime\""}' /var/lib/rancher/rke2/agent/etc/containerd/config.toml > /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    echo -e '\n[plugins.linux]\n  runtime = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
    echo -e '\n[plugins.cri.containerd.runtimes.nvidia-container-runtime]\n  runtime_type = "io.containerd.runc.v2"\n  [plugins.cri.containerd.runtimes.nvidia-container-runtime.options]\n    BinaryName = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
  3. Restart rke2-agent by running the following commands:
    systemctl restart rke2-agentsystemctl restart rke2-agent

Step 2: Enabling the GPU in the Cluster

  1. Run the following commands from any of the server nodes.
  2. Navigate to the UiPathAutomationSuite folder.
    cd /opt/UiPathAutomationSuitecd /opt/UiPathAutomationSuite

Enabling the GPU in an Online Installation

DOCKER_REGISTRY_URL=$(cat defaults.json | jq -er ".registries.docker.url")
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
Enabling the GPU in an Offline Installation

sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
