Skip to content

High Availability Controller Swiftlet

Overview

The High Availability Controller Swiftlet manages the active/standby state machine for SwiftMQ routers in a high availability (HA) cluster. It coordinates state transitions, heartbeats, configuration and message replication, and failover decisions to ensure seamless operation and failover between HA nodes.

Features

Active/Standby State Machine

The Swiftlet implements a robust state machine to manage the lifecycle of HA router instances. States include UNKNOWN, INITIALIZE, NEGOTIATE, STANDALONE, ACTIVE-SYNC-PREPARE, ACTIVE-SYNC, ACTIVE, STANDBY-SYNC-PREPARE, STANDBY-SYNC, and STANDBY. The state machine ensures that only one instance is active at a time, coordinates negotiation and synchronization, and handles transitions based on cluster events, configuration, and network conditions. State transitions are logged and can trigger actions such as freezing/unfreezing thread pools or initiating failover.

Preferred Active Instance

The property preferred-active determines whether this router should be the preferred active instance. During negotiation, this preference influences which node becomes active if both are eligible.

Negotiation Timeout

The property negotiation-timeout specifies the maximum time (in milliseconds) to wait for negotiation to complete. If exceeded, the instance may transition to STANDALONE or take other actions depending on the last saved state.

Split Brain Detection and Action

If both nodes detect themselves as STANDALONE (split brain), the Swiftlet executes the action specified by split-brain-instance-action. Options are keep (continue running), stop (shut down), or backup-and-standby (backup shared store and restart as standby). This mechanism prevents data corruption and ensures administrative control in split brain scenarios.

Configuration Example:

<swiftlet name="sys$hacontroller" preferred-active="true" negotiation-timeout="60000" split-brain-instance-action="backup-and-standby"/>

Replication Channel and Heartbeat Management

The Swiftlet manages a dedicated replication channel for HA communication between nodes. This channel is responsible for all state, configuration, and message replication, as well as heartbeat monitoring. The channel can be configured with a single listener (for inbound connections) and a single connector (for outbound connections). Heartbeat messages are sent at the interval specified by heartbeat-interval, and the connection is closed if more than heartbeat-missing-threshold heartbeats are missed.

Listener and Connector Configuration

Only one listener and one connector are supported. The listener defines the local bind address and port for incoming replication connections, while the connector specifies the remote host and port for outbound connections. Buffer sizes and TCP options can be tuned for performance.

Heartbeat Monitoring

The properties heartbeat-interval and heartbeat-missing-threshold control the frequency and tolerance of heartbeat messages. If the threshold is exceeded, the replication connection is closed and the state machine reacts accordingly.

Maximum Packet Size

The property max-packet-size (in KB) limits the size of replication packets, affecting replication throughput and memory usage.

Configuration Example:

<swiftlet name="sys$hacontroller">
  <replication-channel heartbeat-interval="1000" heartbeat-missing-threshold="5" max-packet-size="512">
    <listeners>
      <listener name="main" port="4000"/>
    </listeners>
    <connectors>
      <connector name="to-peer" hostname="ha-peer.example.com" port="4000"/>
    </connectors>
  </replication-channel>
</swiftlet>

Replication Tunnels (Internal Infrastructure)

Replication tunnels define the logical links used for configuration and message replication between HA nodes. Each tunnel has a unique address and a set of supported protocol versions. These tunnels are predefined in the configuration and should not be modified by administrators. They are used internally by the HA Controller to establish and manage replication flows.

Configuration Example:

<swiftlet name="sys$hacontroller">
  <replication-tunnels>
    <replication-tunnel name="main" tunnel-address="0" versions="600,930"/>
  </replication-tunnels>
</swiftlet>

Configuration Replication and Exclusions

The Swiftlet replicates configuration changes between HA nodes to ensure consistent operation. Certain configuration entities or properties can be excluded from replication using the replication-excludes list. Property substitutions can also be defined to override specific property values during replication. These features are mainly for advanced scenarios and should be used with care.

Replication Excludes

Entities or properties listed under replication-excludes will not be replicated to the standby node. This is useful for properties that must remain unique per instance, such as network addresses.

Property Substitutions

The property-substitutions list allows administrators to specify alternate values for properties when replicated. This can be used to customize configuration on the standby node.

Configuration Example:

<swiftlet name="sys$hacontroller">
  <configuration-controller>
    <replication-excludes>
      <replication-exclude name="/sys$net/listeners"/>
    </replication-excludes>
    <property-substitutions>
      <property-substitution name="/sys$net/listeners/main/port">
        <substitute-with>4001</substitute-with>
      </property-substitution>
    </property-substitutions>
  </configuration-controller>
</swiftlet>

Configuration Guide

Configuring a Preferred Active Instance

Use this scenario when you want to ensure that a specific router in the HA pair becomes the active instance whenever possible. This is useful for planned failover or maintenance.

  1. Set the preferred-active attribute to true on the desired router.
  2. Restart both routers or trigger a negotiation to apply the preference.
<swiftlet name="sys$hacontroller" preferred-active="true"/>

Tuning Heartbeat Sensitivity

Adjust heartbeat parameters to detect failures more quickly or to tolerate transient network issues. Lower intervals and thresholds increase sensitivity but may cause false positives in unstable networks.

  1. Set heartbeat-interval to the desired interval in milliseconds (e.g., 500 for half a second).
  2. Set heartbeat-missing-threshold to the number of missed heartbeats before considering the peer down.
<swiftlet name="sys$hacontroller">
  <replication-channel heartbeat-interval="500" heartbeat-missing-threshold="3"/>
</swiftlet>

Handling Split Brain with Backup and Standby

When using shared storage, you may want the router to back up the store and restart as standby if a split brain is detected. This prevents data corruption and allows for safe administrative recovery.

  1. Set split-brain-instance-action to backup-and-standby.
  2. Ensure shared storage is configured and accessible by both routers.
<swiftlet name="sys$hacontroller" split-brain-instance-action="backup-and-standby"/>

Customizing Replication Channel Network Settings

Customize the replication channel's listener and connector to match your network topology and security requirements. Only one listener and one connector are supported.

  1. Define a listener with the desired port and optional bindaddress.
  2. Define a connector with the remote peer's hostname and port.
<swiftlet name="sys$hacontroller">
  <replication-channel>
    <listeners>
      <listener name="main" port="4000" bindaddress="192.168.1.10"/>
    </listeners>
    <connectors>
      <connector name="to-peer" hostname="192.168.1.11" port="4000"/>
    </connectors>
  </replication-channel>
</swiftlet>

Configuration Reference

The top-level entity in routerconfig.xml is <swiftlet name="sys$hacontroller">.

<swiftlet name="sys$hacontroller"> Properties

These properties are attributes of the <swiftlet name="sys$hacontroller"> entity.

Parameter Type Default Mandatory Reboot Required Description
preferred-active Boolean false No No States whether this router is the preferred Active Instance
negotiation-timeout Long 1800000 No No Time after which a Negotation must be initiated (min: 1000)
split-brain-instance-action String stop No No Action taken on this Instance when a Split Brain is detected (choices: keep, stop, backup-and-standby)
<swiftlet name="sys$hacontroller" preferred-active="false" negotiation-timeout="1800000" split-brain-instance-action="stop"/>

<spool> Entity

Spool Settings

This is a fixed child entity of <swiftlet name="sys$hacontroller">.

Parameter Type Default Mandatory Reboot Required Description
directory String ./ No Yes Spool Directory
max-cache-size Integer 5120 No Yes Specifies the size in KB to held in memory. (min: 1024)
<swiftlet name="sys$hacontroller">
  <spool directory="..." max-cache-size="..."/>
</swiftlet>

<configuration-controller> Entity

Tracks and controls configuration replication

This is a fixed child entity of <swiftlet name="sys$hacontroller">.

<swiftlet name="sys$hacontroller">
  <configuration-controller/>
</swiftlet>

<property-substitutions> in <configuration-controller>

Property Substitutions

Each <property-substitution> entry is identified by its name attribute (the Property Substitution).

Parameter Type Default Mandatory Reboot Required Description
substitute-with String No No Substitute the Property Value with this Value
<swiftlet name="sys$hacontroller">
  <configuration-controller>
    <property-substitutions>
      <property-substitution name="..."/>
    </property-substitutions>
  </configuration-controller>
</swiftlet>

<replication-excludes> in <configuration-controller>

Replication Excludes

Each <replication-exclude> entry is identified by its name attribute (the Replication Exclude).

<swiftlet name="sys$hacontroller">
  <configuration-controller>
    <replication-excludes>
      <replication-exclude name="..."/>
    </replication-excludes>
  </configuration-controller>
</swiftlet>

<replication-channel> Entity

Replication Channel Listeners and Connectors

This is a fixed child entity of <swiftlet name="sys$hacontroller">.

Parameter Type Default Mandatory Reboot Required Description
heartbeat-interval Long 2000 No No Interval for sending Heart Beat Messages (min: 100)
heartbeat-missing-threshold Integer 10 No No Closes Replication Connections after missing this number of Heart Beat Messages (min: 1)
max-packet-size Integer 1024 No No Maximum Packet Size (KB) (min: 1)
<swiftlet name="sys$hacontroller">
  <replication-channel heartbeat-interval="..." heartbeat-missing-threshold="..." max-packet-size="..."/>
</swiftlet>

<listeners> in <replication-channel>

Listener Definitions

Each <listener> entry is identified by its name attribute (the Listener).

Parameter Type Default Mandatory Reboot Required Description
bindaddress String No No Listener Bind IP Address
port Integer Yes No Listener Port
use-tcp-no-delay Boolean true No No Use Tcp No Delay
router-input-buffer-size Integer 1048576 No No Router Network Input Buffer Size (min: 65536)
router-input-extend-size Integer 1048576 No No Router Network Input Extend Size (min: 65536)
router-output-buffer-size Integer 131072 No No Router Network Output Buffer Size (min: 1024)
router-output-extend-size Integer 131072 No No Router Network Output Extend Size (min: 1024)
<swiftlet name="sys$hacontroller">
  <replication-channel>
    <listeners>
      <listener name="..." port="..."/>
    </listeners>
  </replication-channel>
</swiftlet>

<connectors> in <replication-channel>

Connector Definitions

Each <connector> entry is identified by its name attribute (the Connector).

Parameter Type Default Mandatory Reboot Required Description
hostname String Yes No Remote Hostname
port Integer Yes No Remote Port
use-tcp-no-delay Boolean true No No Use Tcp No Delay
retry-time Long 1000 No No Retry Time (min: 100)
router-input-buffer-size Integer 1048576 No No Router Network Input Buffer Size (min: 65536)
router-input-extend-size Integer 1048576 No No Router Network Input Extend Size (min: 65536)
router-output-buffer-size Integer 131072 No No Router Network Output Buffer Size (min: 1024)
router-output-extend-size Integer 131072 No No Router Network Output Extend Size (min: 1024)
<swiftlet name="sys$hacontroller">
  <replication-channel>
    <connectors>
      <connector name="..." hostname="..." port="..."/>
    </connectors>
  </replication-channel>
</swiftlet>

<replication-tunnels> in <swiftlet name="sys$hacontroller">

Replication Tunnels

Each <replication-tunnel> entry is identified by its name attribute (the Replication Tunnel).

Parameter Type Default Mandatory Reboot Required Description
tunnel-address Integer No No Tunnel Address (min: 0)
versions String No No Supported Tunnel Protocol Versions
<swiftlet name="sys$hacontroller">
  <replication-tunnels>
    <replication-tunnel name="..."/>
  </replication-tunnels>
</swiftlet>