Understanding repository partitioning

A repository is one of the core components that enable you to manage your automation workspace (automations and files). Partitioning helps you scale your repositories and optimize the performance of repository-related operations such as check-in and check-out.

Note: The repository partitioning feature requires the Enterprise Platform license. For more information about supported version for this feature, see Enterprise Platform.

Overview

The Automation 360 repository is a single Git repository where all the bots, forms, processes, and dependency files are stored. As the repository is based on Git, some out-of-the-box version control features are available such as check-in, check-out, version history, roll back, and version compare. Therefore, integrating with an external remote Git is not a requirement in Automation 360.

All the files in the Git repository in Automation 360 are stored for versioning. Over a period of time, the Git repository might become large due to the number of files, size of files, Git commits, and so on. This might lead to latency in the execution of repository actions.

With repository partitioning, you can split the Automation 360 repository folder into separate Git repositories. You can partition the large-sized public repository at the root level folder into multiple Git repositories at selected folder levels, thereby limiting any performance issues in the repository partitions.

Note: A Control Room administrator or users with the Partition repository permission can use the repository partitioning feature.

Benefits

Some of the benefits of partitioning your repositories include:

Faster operations due to quick check-in and check-out
As the folders are partitioned, each partitioned folder has a comparatively smaller number of check-in activities (commits). As a result of these smaller commits, check-in and check-out operations (including concurrent check-ins and check-outs) are quicker.
Logical segregation of folders in Git space
Creating multiple repositories in Git mitigates the risk of a single point of failure. Issues in one repository do not adversely impact other repositories or automations contained within them.

Recommendations

In a typical production scenario, folders corresponding to various departments are created within the Bots folder. Based on the specific business processes or projects (multiple business processes), sub-folders are created within the department folders. It is also common for customers to have shared libraries that can be created at different levels, which are further shared across other processes.

Consider a scenario where all the files are checked in to the same partition (git repository) by various automation or citizen developers at the same time. This can impact the data processing speed and result in considerable delay. In order to ensure that the performance is not impacted in the long term, review the recommended approach below:

Operational approach
It is recommended that you create one partition per team. A team comprises of a group of people working on a similar set of business processes. Limit the number of developers in each set to 50 or less for optimal performance for every team. If there are more developers, you can split them across partitions for scalability.
For any partition you perform, a user who is trying to check-in or import automations or files in to that folder receives an alert that the partitioning is in progress.

When the partitioning is in progress, users are restricted from performing check-in, import, bulk check-in, and deletions on the partitioned folder or sub-folders. Even if a user attempted to check-in an automation during that time, an alert that suggests them to wait is displayed. The user can then continue to work on the private workspace with no issues and the work is never blocked. Additionally, Git restore, Git settings, and bulk package updates are universally restricted.

Testing on repositories with a size of 20GB has shown that the operation could take upto 1.5 hours. However, this time can vary depending on the performance of the Network-attached storage (NAS). The operational cost is very low and you can do this over time.

For example, you can perform 4 to 6 partitions at the end of the business day when there are fewer users or Citizen Developers. This enables you run the partitioning process without impacting other business users.

Determine the folders to partition
Review the following recommendations on partitioning the folders:
  • Run the RepositoryFolderSizeReport.exe tool from the scripts directory on one of the Control Room nodes using the following command:
    RepositoryFolderSizeReport.exe --root "Z:\Server Files\repository\16933f12-fdee-4a7f-8e76-a9bf127918c6\0\Automation Anywhere\Bots"
  • Ensure that you replace the above script with the correct tenant id from your environment.

    For example, 16933f12-fdee-4a7f-8e76-a9bf127918c6.

  • Use Z:\Server Files\repository to review the NAS drive.
  • Generate folder_sizes.csv report containing folder paths and their respective sizes in MB.
    This enables you to plan your folder partitioning.
    Note: Git recommends that you restrict the repository size to less than 2GB for optimal performance.
  • This utility will provide the size of each folder that will help you plan the folder partition.
Timeout settings
Repository partitioning initiates an external program to partition the folder. If this external program does not provide an output within 12 hours (default), it is terminated, and the user can restart the partitioning process. If the 12 hour timeout is too long, you can change it to 2 hours by modifying the properties file located at <CR_Folder>\\config\\repository.properties.

Ensure that you update the repository.partition.read.line.timeout property by setting the value in seconds. For example, for a 2-hour timeout, set the value to 7200.

Restart the kernel after all the changes.

Monitoring repository partitioning
  • Monitor the progress of the partitioning in the log files using the PartitionMonitor script.
  • Run the PartitionMonitor.ps1 script on each CR node.

    The partitioning request is processed on one of the nodes by design. Therefore, progress results are visible only on that specific node.

  • Open Windows PowerShell and navigate to the scripts directory (C:\scripts). Run the following command:
    .\PartitionMonitor.ps1 -sourceDirectory "C:\ProgramData\AutomationAnywhere\Logs" -outputFile out.log