Active-Passive High Availability

Tula supports active-passive high availability to ensure uninterrupted load balancing in the event of a hardware or software failure. An HA pair consists of two Tula nodes: one active (master) and one passive (backup). Under normal operation, the active node handles all traffic. If the active node becomes unavailable, the passive node assumes control automatically, typically within three seconds.

Architecture Overview

Tula's high availability is built on VRRP (Virtual Router Redundancy Protocol) implemented through keepalived. Both nodes in the HA pair share one or more floating IP addresses, also known as virtual IPs (VIPs). Clients connect to these floating IPs rather than to a specific node's physical address. When the active node fails, VRRP reassigns the floating IPs to the passive node, which begins processing traffic with no client-side reconfiguration required.

Each node in the pair maintains its own independent network configuration (physical interfaces, IP addresses, routes) while sharing a common cluster configuration that defines VIPs, backend servers, SSL certificates, health checks, and all other load balancing settings.

Configuration Synchronization

Tula uses csync2 to keep cluster configuration consistent between nodes. When you make a change on the active node, the updated configuration is automatically synchronized to the passive node. This ensures that when failover occurs, the backup node has an identical and current configuration ready to serve traffic. Synchronization covers all shared cluster settings including HAProxy and nftlb configurations, SSL certificates, SNMP settings, and health check definitions. Node-specific settings such as hostnames and interface assignments are stored locally and are not synchronized.

How Failover Works

Failover is triggered when the passive node detects that the active node is no longer advertising VRRP heartbeats. The process follows these steps:

Health check failure -- The active node experiences a critical failure (hardware fault, kernel panic, network partition, or service crash).
VRRP timeout -- The passive node stops receiving VRRP advertisements from the active node. The default advertisement interval is one second, and failover is initiated after three missed advertisements.
Priority transition -- The passive node elevates its VRRP priority, transitions to master state, and sends a gratuitous ARP announcement to update upstream switches and routers.
VIP migration -- All floating IP addresses are reassigned to the passive node's interfaces. The node begins accepting and processing traffic immediately.

Total failover time is typically under three seconds from the moment of failure to the moment traffic resumes on the backup node.

Split-Brain Prevention

Split-brain occurs when both nodes believe they are the active master simultaneously, which can result in duplicate IP addresses and unpredictable traffic routing. Tula mitigates split-brain through several mechanisms. VRRP uses a shared virtual router ID and authentication to ensure both nodes participate in the same VRRP group. Keepalived's preemption setting controls whether the original master reclaims the active role after recovery, preventing unnecessary failover oscillation. Additionally, Tula monitors critical service health (HAProxy, nftlb, keepalived) and reduces the local VRRP priority if a monitored service fails, ensuring traffic is directed to the healthier node.

Configuring an HA Pair

To configure active-passive high availability:

Deploy two Tula nodes on the same Layer 2 network segment. Both nodes must be able to exchange VRRP multicast traffic.
Navigate to System > High Availability in the web interface on the node designated as the primary.
Add the peer node by entering its management IP address. Tula will establish a trust relationship and configure csync2 for configuration synchronization.
Assign floating IPs under the VIP configuration. These addresses will migrate between nodes during failover.
Set VRRP priorities for each node. The node with the higher priority becomes the active master. A common convention is 150 for the primary and 100 for the backup.
Enable HA and synchronize the configuration. Both nodes will begin exchanging VRRP advertisements, and the node with the higher priority will assume the active role.

After configuration, verify HA status on the dashboard. The active node will display as Master and the passive node as Backup. You can test failover by rebooting the active node and confirming that the passive node assumes the master role and traffic continues without interruption.