High Availability Controller Swiftlet
Overview
The High Availability Controller Swiftlet manages the active/standby state machine for SwiftMQ routers in a high availability (HA) cluster. It coordinates state transitions, heartbeats, configuration and message replication, and failover decisions to ensure seamless operation and failover between HA nodes.
Features
Active/Standby State Machine
The Swiftlet implements a robust state machine to manage the lifecycle of HA router instances. States include UNKNOWN, INITIALIZE, NEGOTIATE, STANDALONE, ACTIVE-SYNC-PREPARE, ACTIVE-SYNC, ACTIVE, STANDBY-SYNC-PREPARE, STANDBY-SYNC, and STANDBY. The state machine ensures that only one instance is active at a time, coordinates negotiation and synchronization, and handles transitions based on cluster events, configuration, and network conditions. State transitions are logged and can trigger actions such as freezing/unfreezing thread pools or initiating failover.
Preferred Active Instance
The property preferred-active determines whether this router should be the preferred active instance. During negotiation, this preference influences which node becomes active if both are eligible.
Negotiation Timeout
The property negotiation-timeout specifies the maximum time (in milliseconds) to wait for negotiation to complete. If exceeded, the instance may transition to STANDALONE or take other actions depending on the last saved state.
Split Brain Detection and Action
If both nodes detect themselves as STANDALONE (split brain), the Swiftlet executes the action specified by split-brain-instance-action. Options are keep (continue running), stop (shut down), or backup-and-standby (backup shared store and restart as standby). This mechanism prevents data corruption and ensures administrative control in split brain scenarios.
Configuration Example:
<swiftlet name="sys$hacontroller" preferred-active="true" negotiation-timeout="60000" split-brain-instance-action="backup-and-standby"/>
Replication Channel and Heartbeat Management
The Swiftlet manages a dedicated replication channel for HA communication between nodes. This channel is responsible for all state, configuration, and message replication, as well as heartbeat monitoring. The channel can be configured with a single listener (for inbound connections) and a single connector (for outbound connections). Heartbeat messages are sent at the interval specified by heartbeat-interval, and the connection is closed if more than heartbeat-missing-threshold heartbeats are missed.
Listener and Connector Configuration
Only one listener and one connector are supported. The listener defines the local bind address and port for incoming replication connections, while the connector specifies the remote host and port for outbound connections. Buffer sizes and TCP options can be tuned for performance.
Heartbeat Monitoring
The properties heartbeat-interval and heartbeat-missing-threshold control the frequency and tolerance of heartbeat messages. If the threshold is exceeded, the replication connection is closed and the state machine reacts accordingly.
Maximum Packet Size
The property max-packet-size (in KB) limits the size of replication packets, affecting replication throughput and memory usage.
Configuration Example:
<swiftlet name="sys$hacontroller">
<replication-channel heartbeat-interval="1000" heartbeat-missing-threshold="5" max-packet-size="512">
<listeners>
<listener name="main" port="4000"/>
</listeners>
<connectors>
<connector name="to-peer" hostname="ha-peer.example.com" port="4000"/>
</connectors>
</replication-channel>
</swiftlet>
Replication Tunnels (Internal Infrastructure)
Replication tunnels define the logical links used for configuration and message replication between HA nodes. Each tunnel has a unique address and a set of supported protocol versions. These tunnels are predefined in the configuration and should not be modified by administrators. They are used internally by the HA Controller to establish and manage replication flows.
Configuration Example:
<swiftlet name="sys$hacontroller">
<replication-tunnels>
<replication-tunnel name="main" tunnel-address="0" versions="600,930"/>
</replication-tunnels>
</swiftlet>
Configuration Replication and Exclusions
The Swiftlet replicates configuration changes between HA nodes to ensure consistent operation. Certain configuration entities or properties can be excluded from replication using the replication-excludes list. Property substitutions can also be defined to override specific property values during replication. These features are mainly for advanced scenarios and should be used with care.
Replication Excludes
Entities or properties listed under replication-excludes will not be replicated to the standby node. This is useful for properties that must remain unique per instance, such as network addresses.
Property Substitutions
The property-substitutions list allows administrators to specify alternate values for properties when replicated. This can be used to customize configuration on the standby node.
Configuration Example:
<swiftlet name="sys$hacontroller">
<configuration-controller>
<replication-excludes>
<replication-exclude name="/sys$net/listeners"/>
</replication-excludes>
<property-substitutions>
<property-substitution name="/sys$net/listeners/main/port">
<substitute-with>4001</substitute-with>
</property-substitution>
</property-substitutions>
</configuration-controller>
</swiftlet>
Configuration Guide
Configuring a Preferred Active Instance
Use this scenario when you want to ensure that a specific router in the HA pair becomes the active instance whenever possible. This is useful for planned failover or maintenance.
- Set the
preferred-activeattribute totrueon the desired router. - Restart both routers or trigger a negotiation to apply the preference.
<swiftlet name="sys$hacontroller" preferred-active="true"/>
Tuning Heartbeat Sensitivity
Adjust heartbeat parameters to detect failures more quickly or to tolerate transient network issues. Lower intervals and thresholds increase sensitivity but may cause false positives in unstable networks.
- Set
heartbeat-intervalto the desired interval in milliseconds (e.g., 500 for half a second). - Set
heartbeat-missing-thresholdto the number of missed heartbeats before considering the peer down.
<swiftlet name="sys$hacontroller">
<replication-channel heartbeat-interval="500" heartbeat-missing-threshold="3"/>
</swiftlet>
Handling Split Brain with Backup and Standby
When using shared storage, you may want the router to back up the store and restart as standby if a split brain is detected. This prevents data corruption and allows for safe administrative recovery.
- Set
split-brain-instance-actiontobackup-and-standby. - Ensure shared storage is configured and accessible by both routers.
<swiftlet name="sys$hacontroller" split-brain-instance-action="backup-and-standby"/>
Customizing Replication Channel Network Settings
Customize the replication channel's listener and connector to match your network topology and security requirements. Only one listener and one connector are supported.
- Define a listener with the desired
portand optionalbindaddress. - Define a connector with the remote peer's
hostnameandport.
<swiftlet name="sys$hacontroller">
<replication-channel>
<listeners>
<listener name="main" port="4000" bindaddress="192.168.1.10"/>
</listeners>
<connectors>
<connector name="to-peer" hostname="192.168.1.11" port="4000"/>
</connectors>
</replication-channel>
</swiftlet>
Configuration Reference
The top-level entity in routerconfig.xml is <swiftlet name="sys$hacontroller">.
<swiftlet name="sys$hacontroller"> Properties
These properties are attributes of the <swiftlet name="sys$hacontroller"> entity.
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
preferred-active |
Boolean | false |
No | No | States whether this router is the preferred Active Instance |
negotiation-timeout |
Long | 1800000 |
No | No | Time after which a Negotation must be initiated (min: 1000) |
split-brain-instance-action |
String | stop |
No | No | Action taken on this Instance when a Split Brain is detected (choices: keep, stop, backup-and-standby) |
<swiftlet name="sys$hacontroller" preferred-active="false" negotiation-timeout="1800000" split-brain-instance-action="stop"/>
<spool> Entity
Spool Settings
This is a fixed child entity of <swiftlet name="sys$hacontroller">.
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
directory |
String | ./ |
No | Yes | Spool Directory |
max-cache-size |
Integer | 5120 |
No | Yes | Specifies the size in KB to held in memory. (min: 1024) |
<swiftlet name="sys$hacontroller">
<spool directory="..." max-cache-size="..."/>
</swiftlet>
<configuration-controller> Entity
Tracks and controls configuration replication
This is a fixed child entity of <swiftlet name="sys$hacontroller">.
<swiftlet name="sys$hacontroller">
<configuration-controller/>
</swiftlet>
<property-substitutions> in <configuration-controller>
Property Substitutions
Each <property-substitution> entry is identified by its name attribute (the Property Substitution).
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
substitute-with |
String | — | No | No | Substitute the Property Value with this Value |
<swiftlet name="sys$hacontroller">
<configuration-controller>
<property-substitutions>
<property-substitution name="..."/>
</property-substitutions>
</configuration-controller>
</swiftlet>
<replication-excludes> in <configuration-controller>
Replication Excludes
Each <replication-exclude> entry is identified by its name attribute (the Replication Exclude).
<swiftlet name="sys$hacontroller">
<configuration-controller>
<replication-excludes>
<replication-exclude name="..."/>
</replication-excludes>
</configuration-controller>
</swiftlet>
<replication-channel> Entity
Replication Channel Listeners and Connectors
This is a fixed child entity of <swiftlet name="sys$hacontroller">.
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
heartbeat-interval |
Long | 2000 |
No | No | Interval for sending Heart Beat Messages (min: 100) |
heartbeat-missing-threshold |
Integer | 10 |
No | No | Closes Replication Connections after missing this number of Heart Beat Messages (min: 1) |
max-packet-size |
Integer | 1024 |
No | No | Maximum Packet Size (KB) (min: 1) |
<swiftlet name="sys$hacontroller">
<replication-channel heartbeat-interval="..." heartbeat-missing-threshold="..." max-packet-size="..."/>
</swiftlet>
<listeners> in <replication-channel>
Listener Definitions
Each <listener> entry is identified by its name attribute (the Listener).
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
bindaddress |
String | — | No | No | Listener Bind IP Address |
port |
Integer | — | Yes | No | Listener Port |
use-tcp-no-delay |
Boolean | true |
No | No | Use Tcp No Delay |
router-input-buffer-size |
Integer | 1048576 |
No | No | Router Network Input Buffer Size (min: 65536) |
router-input-extend-size |
Integer | 1048576 |
No | No | Router Network Input Extend Size (min: 65536) |
router-output-buffer-size |
Integer | 131072 |
No | No | Router Network Output Buffer Size (min: 1024) |
router-output-extend-size |
Integer | 131072 |
No | No | Router Network Output Extend Size (min: 1024) |
<swiftlet name="sys$hacontroller">
<replication-channel>
<listeners>
<listener name="..." port="..."/>
</listeners>
</replication-channel>
</swiftlet>
<connectors> in <replication-channel>
Connector Definitions
Each <connector> entry is identified by its name attribute (the Connector).
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
hostname |
String | — | Yes | No | Remote Hostname |
port |
Integer | — | Yes | No | Remote Port |
use-tcp-no-delay |
Boolean | true |
No | No | Use Tcp No Delay |
retry-time |
Long | 1000 |
No | No | Retry Time (min: 100) |
router-input-buffer-size |
Integer | 1048576 |
No | No | Router Network Input Buffer Size (min: 65536) |
router-input-extend-size |
Integer | 1048576 |
No | No | Router Network Input Extend Size (min: 65536) |
router-output-buffer-size |
Integer | 131072 |
No | No | Router Network Output Buffer Size (min: 1024) |
router-output-extend-size |
Integer | 131072 |
No | No | Router Network Output Extend Size (min: 1024) |
<swiftlet name="sys$hacontroller">
<replication-channel>
<connectors>
<connector name="..." hostname="..." port="..."/>
</connectors>
</replication-channel>
</swiftlet>
<replication-tunnels> in <swiftlet name="sys$hacontroller">
Replication Tunnels
Each <replication-tunnel> entry is identified by its name attribute (the Replication Tunnel).
| Parameter | Type | Default | Mandatory | Reboot Required | Description |
|---|---|---|---|---|---|
tunnel-address |
Integer | — | No | No | Tunnel Address (min: 0) |
versions |
String | — | No | No | Supported Tunnel Protocol Versions |
<swiftlet name="sys$hacontroller">
<replication-tunnels>
<replication-tunnel name="..."/>
</replication-tunnels>
</swiftlet>