Understanding repository partitioning
- Updated: 2024/10/18
Understanding repository partitioning
A repository is one of the core components that enable you to manage your automation workspace (automations and files). Partitioning helps you scale your repositories and optimize the performance of repository-related operations such as check-in and check-out.
Overview
The Automation 360 repository is a single Git repository where all the bots, forms, processes, and dependency files are stored. As the repository is based on Git, some out-of-the-box version control features are available such as check-in, check-out, version history, roll back, and version compare. Therefore, integrating with an external remote Git is not a requirement in Automation 360.
All the files in the Git repository in Automation 360 are stored for versioning. Over a period of time, the Git repository might become large due to the number of files, size of files, Git commits, and so on. This might lead to latency in the execution of repository actions.
With repository partitioning, you can split the Automation 360 repository folder into separate Git repositories. You can partition the large-sized public repository at the root level folder into multiple Git repositories at selected folder levels, thereby limiting any performance issues in the repository partitions.
Benefits
Some of the benefits of partitioning your repositories include:
- Faster operations due to quick check-in and check-out
- As the folders are partitioned, each partitioned folder has a comparatively smaller number of check-in activities (commits). As a result of these smaller commits, check-in and check-out operations (including concurrent check-ins and check-outs) are quicker.
- Logical segregation of folders in Git space
- Creating multiple repositories in Git mitigates the risk of a single point of failure. Issues in one repository do not adversely impact other repositories or automations contained within them.
Recommendations
In a typical production scenario, folders corresponding to various departments are created within the Bots folder. Based on the specific business processes or projects (multiple business processes), sub-folders are created within the department folders. It is also common for customers to have shared libraries that can be created at different levels, which are further shared across other processes.
Consider a scenario where all the files are checked in to the same partition (git repository) by various automation or citizen developers at the same time. This can impact the data processing speed and result in considerable delay. In order to ensure that the performance is not impacted in the long term, review the recommended approach below:
- Operational approach
- It is recommended that you create one partition per team. A team comprises of a group of people working on a similar set of business processes. Limit the number of developers in each set to 50 or less for optimal performance for every team. If there are more developers, you can split them across partitions for scalability.
- Determine the folders to partition
- Review the following recommendations on partitioning the folders:
- Run the RepositoryFolderSizeReport.exe tool
from the scripts directory on one of the Control Room nodes using the following
command:
RepositoryFolderSizeReport.exe --root "Z:\Server Files\repository\16933f12-fdee-4a7f-8e76-a9bf127918c6\0\Automation Anywhere\Bots"
- Ensure that you replace the above script with the correct tenant id
from your environment.
For example, 16933f12-fdee-4a7f-8e76-a9bf127918c6.
- Use Z:\Server Files\repository to review the NAS drive.
- Generate folder_sizes.csv report containing
folder paths and their respective sizes in MB.This enables you to plan your folder partitioning.Note: Git recommends that you restrict the repository size to less than 2GB for optimal performance.
- This utility will provide the size of each folder that will help you plan the folder partition.
- Run the RepositoryFolderSizeReport.exe tool
from the scripts directory on one of the Control Room nodes using the following
command:
- Timeout settings
- Repository partitioning initiates an external program to partition the
folder. If this external program does not provide an output within 12 hours
(default), it is terminated, and the user can restart the partitioning
process. If the 12 hour timeout is too long, you can change it to 2 hours by
modifying the properties file located at
<CR_Folder>\\config\\repository.properties.
Ensure that you update the repository.partition.read.line.timeout property by setting the value in seconds. For example, for a 2-hour timeout, set the value to 7200.
Restart the kernel after all the changes.
- Monitoring repository partitioning
-
- Monitor the progress of the partitioning in the log files using the PartitionMonitor script.
- Run the PartitionMonitor.ps1 script on each CR
node.
The partitioning request is processed on one of the nodes by design. Therefore, progress results are visible only on that specific node.
- Open Windows PowerShell and navigate to the scripts directory
(C:\scripts). Run the following
command:
.\PartitionMonitor.ps1 -sourceDirectory "C:\ProgramData\AutomationAnywhere\Logs" -outputFile out.log