Skip to content

HA Router

Overview

The HA (High Availability) Router extends the Universal Router with active/standby clustering for continuous message availability. An HA cluster consists of two router instances — one ACTIVE and one STANDBY. The ACTIVE instance handles all client connections and message processing. The STANDBY instance maintains a synchronized copy of all persistent state and is ready to take over if the ACTIVE fails.

Installation from an Archive

SwiftMQ HA 13.0.0+ ships with a bundled GraalVM CE runtime — no separate JDK installation is required.

Distribution Content

Directory Content
certs/ Self-signed TLS certificates and keystores
data/ Configuration, logs, persistent store
hatest/ HA test suite scripts and configs
kernel/ Kernel Swiftlet jars and metadata
opt/ File-based and JDBC Store Swiftlets
optional-swiftlets/ Extension Swiftlets (JMS, AMQP, JavaMail bridges, replicator)
preconfig/ HA instance-specific preconfig files
scripts/ Router and CLI startup scripts
shared/ Default shared file store location
streams/ System streams

Preconfig Files

The distribution contains the configuration of a single HA instance. It is required that preconfig files are applied so that two HA instances are created. The preconfig/ directory contains:

File Purpose
hostports.xml AMQP, JMS, MQTT, and Routing host/port bindings
repllistener.xml Replication channel listener (primary instance)
replconnector.xml Replication channel connector (secondary instance)
replicatedstore.xml Replicated File Store (default)
sharedstore.xml Shared File Store
jdbcstore.xml JDBC Store

Configuring the Primary Instance

Edit hostports.xml to set listener addresses. Use connectaddress and connectaddress2 for the two HA hosts:

<router name="testrouter">
  <swiftlet name="sys$jms">
    <listeners _op="replace">
      <listener name="plainsocket" hostname="localhost" port="4001"
                hostname2="localhost" port2="4001"
                connectaddress="hostA" connectaddress2="hostB"/>
    </listeners>
  </swiftlet>
</router>

Start the primary instance with the replication listener:

./router ../preconfig/hostports.xml,../preconfig/repllistener.xml,../preconfig/replicatedstore.xml

Configuring the Secondary Instance

The secondary instance uses the same hostports.xml but connects to the primary via replconnector.xml:

<router>
  <swiftlet name="sys$hacontroller">
    <replication-channel _op="replace">
      <connectors>
        <connector name="1" hostname="hostA" port="2001"/>
      </connectors>
    </replication-channel>
  </swiftlet>
</router>

Start the secondary instance with the replication connector:

./router ../preconfig/hostports.xml,../preconfig/replconnector.xml,../preconfig/replicatedstore.xml

Heap Memory

Set SWIFTMQ_JVMPARAM to adjust the JVM heap (default: -Xmx2G):

export SWIFTMQ_JVMPARAM="-Xmx4G"

Running as Docker Containers

<routername>/
    preconfig/
    data/
    jdbc-driver/   # only if using JDBC Store

Docker Compose — Primary Instance

services:
  swiftmq:
    image: "iitsoftware/swiftmq-ha:latest"
    ports:
      - "4001:4001"
      - "2001:2001"
    environment:
      - SWIFTMQ_PRECONFIG=/swiftmq/preconfig/hostports.xml,/swiftmq/preconfig/repllistener.xml,/swiftmq/preconfig/replicatedstore.xml
      - SWIFTMQ_JVMPARAM=-Xmx4G
    volumes:
      - ./preconfig:/swiftmq/preconfig
      - ./data:/swiftmq/data

Docker Compose — Secondary Instance

services:
  swiftmq:
    image: "iitsoftware/swiftmq-ha:latest"
    ports:
      - "4001:4001"
    environment:
      - SWIFTMQ_PRECONFIG=/swiftmq/preconfig/hostports.xml,/swiftmq/preconfig/replconnector.xml,/swiftmq/preconfig/replicatedstore.xml
      - SWIFTMQ_JVMPARAM=-Xmx4G
    volumes:
      - ./preconfig:/swiftmq/preconfig
      - ./data:/swiftmq/data

The primary instance exposes port 2001 for the replication channel. The secondary connects to it via the replconnector.xml preconfig.

For inter-container communication, use host.docker.internal to address the Docker host. On Linux, add extra_hosts: ["host.docker.internal:host-gateway"] to the Compose service.

JDBC Store: Replace replicatedstore.xml with jdbcstore.xml, mount the JDBC driver directory (./jdbc-driver:/swiftmq/jdbc-driver), and add SWIFTMQ_STORETYPE=JDBC to the environment.

Shutdown

Always use docker stop for a graceful shutdown — never docker kill, as it can corrupt the persistent store.

Standard HA Configuration

The standard configuration is identical to SwiftMQ UR except it adds attributes for a second instance on JMS/Routing listeners and includes a replication listener on port 2001. Authentication is disabled by default. Log in as anonymous or use admin / secret.

HA Deployment

Network

Both HA instances communicate through a dedicated replication channel used by the HA Controller Swiftlet to replicate state and exchange heartbeat messages. This connection should use a private network segment with dedicated network cards and a switch between both hosts.

Network speed depends on the persistent store type:

  • Replicated File Store: Gigabit link recommended for high-load scenarios (store data is replicated over this channel).
  • Shared File Store / Shared JDBC Store: 10 or 100 MBit sufficient (no data replication needed).

Persistent Store Options

Replicated File Store (default) — During connection, a store image transfers to the STANDBY. The transaction log replicates synchronously, ensuring both instances are consistent. This avoids disk sync requirements — standard disks suffice since two store copies exist. Performance depends on the replication channel speed.

Shared File Store — Both instances access the same filesystem (e.g., SAN, NFS). No replication needed, but disk sync (force-sync) is mandatory. Without it, a hard kill of the ACTIVE risks transaction log inconsistency, causing STANDBY startup failure. Requires high-speed RAID or SAN hardware.

Shared JDBC Store — Both instances use the same JDBC database. No replication needed, but the database itself must be clustered to avoid a single point of failure.

HA States and Failover

State Machine

The HA Controller manages these states:

State Description
UNKNOWN Initial state; awaits negotiation with the other instance
INITIALIZE Replication channel being set up
NEGOTIATE Temporary master elected for role negotiation
ACTIVE-SYNC ACTIVE creates a store snapshot and transfers it to STANDBY
ACTIVE Synchronization complete; ongoing replication to STANDBY
STANDBY-SYNC STANDBY receives store snapshot
STANDBY Synchronization complete; receives replication stream
STANDALONE Other instance disconnected; operates independently

Failover Process

When the STANDBY detects that the ACTIVE has failed (via IOException or missed heartbeat messages), it transitions to STANDALONE and begins accepting client connections. Clients using reconnection automatically reconnect to the new active instance.

When the previous ACTIVE restarts, it becomes STANDBY while the current STANDALONE becomes ACTIVE. The new ACTIVE replicates its store to the new STANDBY.

Automatic Disk Sync in STANDALONE Mode

When using the Replicated File Store, force-sync defaults to false because the STANDBY provides redundancy. The attribute force-sync-in-standalone-mode (default: true) dynamically enables disk sync when transitioning to STANDALONE and disables it when returning to ACTIVE, protecting against crashes while running without a STANDBY.

Administration

CLI Under HA

The CLI automatically connects to the ACTIVE instance. After failover, the CLI reconnects and always maintains a connection to the current ACTIVE. If a request fails during failover, it is canceled with a warning and must be reissued.

Additional CLI commands for HA:

  • reboot — Reboots both ACTIVE and STANDBY instances
  • halt — Halts both instances
  • save — Saves configuration on both instances
  • rebootactive / rebootstandby — Reboot a specific instance
  • haltactive / haltstandby — Halt a specific instance

Configuration Replication

Configuration changes made on the ACTIVE are automatically replicated to the STANDBY. Changes can be made at any time regardless of whether the STANDBY is connected — replication occurs upon reconnection.

Instance-specific settings are not replicated:

  • Deploy Swiftlet
  • HA Controller Swiftlet (partial)
  • Log Swiftlet
  • Store Swiftlet
  • Trace Swiftlet

The HA Controller's replication exclude list controls which configuration contexts are excluded from replication.

Router Configuration File

Each instance has its own routerconfig.xml in data/config/. Changes should be applied through the CLI or SwiftMQ Explorer. Direct file modifications require stopping both instances. The <ha-router> section containing state-transition entries is static and must never be modified.

JNDI/JMS Under HA

JNDI Provider URL

The SMQP provider URL includes parameters for both HA instances:

smqp://host1:4001/host2=host2;port2=4002;reconnect=true;retrydelay=1000;maxretries=50

Key parameters: host2 and port2 specify the second HA instance. reconnect=true enables automatic failover reconnection. retrydelay and maxretries control reconnection behavior.

JMS Failover Behavior

SwiftMQ HA provides transparent failover with these guarantees and caveats:

  • Persistent messages are preserved across failover. Always use the default delivery mode (persistent) or set it explicitly.
  • Non-persistent messages may be lost during failover.
  • receive(timeout) may return null during failover if the timeout is less than the failover time.
  • Duplicate detection uses JMS Message-Id. JMS message ID generation must remain enabled.
  • Temporary destinations are reconstructed during failover but bound to different physical queues. For reliability, use regular queues or durable subscribers with persistent messages instead.

Reconnect Listener

Applications can register a reconnect listener for failover notifications:

((com.swiftmq.jms.SwiftMQConnection) connection).addReconnectListener(
    (host, port) -> {
        // Handle reconnection
    }
);

Enable reconnect debug output with -Dswiftmq.reconnect.debug=true.

Routing Under HA

Routing connectors and listeners are automatically replicated to the STANDBY instance. Routing listeners include bindaddress2 and port2 attributes for the secondary HA instance.

Define routing connectors at the HA Router level rather than at individual listeners — this avoids configuration changes across a Federated Router Network when failover occurs.

For fallback, connecting routers need an additional routing connector to the secondary HA instance. The system automatically attempts the alternative connector if the primary connection fails.

Split Brain

A split brain occurs when both HA instances operate in STANDALONE mode simultaneously, losing data consistency. Causes include negotiation timeout expiration and replication channel loss (network failure).

The split-brain-instance-action attribute controls the response:

  • stop — Stops the instance (default)
  • keep — Keeps the instance running
  • backup-and-standby — Creates a store backup and restarts as STANDBY
  1. Mark one instance as preferred-active (preferred-active="true")
  2. Set the preferred instance to split-brain-instance-action="keep"
  3. Set the other instance to split-brain-instance-action="backup-and-standby"

This allows automatic recovery: the preferred instance continues as STANDALONE, the other creates a backup and restarts as STANDBY, then the preferred instance replicates to it.

Duplicate Message Detection

HA environments use duplicate message detection to ensure no message is delivered twice across failover.

Inbound detection (at the router): Controlled by duplicate-detection-enabled and duplicate-detection-backlog-size attributes on queues. Default backlog is 2000 JMS message IDs.

Outbound detection (at the client): Controlled by duplicate-message-detection and duplicate-backlog-size on connection factories. Default backlog is 30,000 per JMS connection.

JMS message IDs are asynchronously replicated to the STANDBY instance.

HA-Specific Swiftlets

  • HA Controller (sys$hacontroller) — Manages the active/standby state machine, heartbeats, replication channel, and failover decisions.
  • HA Queue Manager (sys$queuemanager) — Extends the CE Queue Manager with replication-aware queue operations.
  • HA Store (sys$store) — Extends the CE Store with synchronous replication of all persistent data.