High availability deployment

To support Automation 360 in your data center, configure an high availability (HA) cluster. Follow your company methods and procedures for implementing your data center cluster.

Why only odd number of nodes are supported
A cluster comprising even number of nodes can merge inconsistencies but can result in a split-brain condition where the cluster has no majority and cannot resolve transactions, which might result in data inconsistencies. Split-brain condition is a known limitation of clustering systems that can be caused by network issues including latency.

Deployment configurations with odd numbered nodes can help avoid split-brain issues and are recommended for Automation 360 deployments.

Quorum
The nodes determine which transaction can be processed through voting on each transaction. The number of votes constituting a majority of the nodes in the cluster is referred to as a quorum and determines how many nodes have to vote for or confirm a transaction before it can be processed.
Fault Tolerance
Fault tolerance in terms of node failure is determined by how many nodes can fail before a quorum or majority of nodes is not available to vote on the validity of any transaction. Fault tolerance is optimized with an odd number of nodes in the cluster because a majority in odd-numbered clusters is a lower number than in even-numbered clusters
Supported Configurations
A cluster with three or higher odd number of nodes prevents the split-brain condition or inconsistencies due to network issues, while giving the higher scale and availability.
Number of nodes in cluster Majority (quorum) Fault tolerance (node failures) Support
3 2 1 Certified
5 3 2 Contact Automation Anywhere support
7 and so on 4 and so on 3 and so on Contact Automation Anywhere support
Multi-availability Zone/Multi-datacenter Configurations
When going for a multi-zone deployment in further enhance the availability, say with 3 nodes deployment, we recommend that you have each Control Room in separate availability zones. Deployments with more than 3 nodes, spread these deployments across at least 3 availability zones. One thing we need to be concerned about in these setups is the latency between the zones/providers. The nodes in a high availability cluster must be deployed in the same region.

In terms of the cloud providers, we currently support 3 major cloud providers - Amazon Web Services, Google Cloud Platform, and Microsoft Azure.

Note: In a multi-node environment, if a node goes down, operations such as bot deployments and schedules, triggers, and work items in queues on that node will be adversely affected.
Tip: For information on how to backup and restore files to recover a Control Room High Availability cluster in case of failure, see Backing up and restoring a Control Room High Availability Cluster (A-People login required).