HA Router
Overview
The HA (High Availability) Router extends the Universal Router with active/standby clustering for continuous message availability. An HA cluster consists of two router instances — one ACTIVE and one STANDBY. The ACTIVE instance handles all client connections and message processing. The STANDBY instance maintains a synchronized copy of all persistent state and is ready to take over if the ACTIVE fails.
Installation from an Archive
SwiftMQ HA 13.0.0+ ships with a bundled GraalVM CE runtime — no separate JDK installation is required.
Distribution Content
| Directory | Content |
|---|---|
certs/ |
Self-signed TLS certificates and keystores |
data/ |
Configuration, logs, persistent store |
hatest/ |
HA test suite scripts and configs |
kernel/ |
Kernel Swiftlet jars and metadata |
opt/ |
File-based and JDBC Store Swiftlets |
optional-swiftlets/ |
Extension Swiftlets (JMS, AMQP, JavaMail bridges, replicator) |
preconfig/ |
HA instance-specific preconfig files |
scripts/ |
Router and CLI startup scripts |
shared/ |
Default shared file store location |
streams/ |
System streams |
Preconfig Files
The distribution contains the configuration of a single HA instance. It is required that preconfig files are applied so that two HA instances are created. The preconfig/ directory contains:
| File | Purpose |
|---|---|
hostports.xml |
AMQP, JMS, MQTT, and Routing host/port bindings |
repllistener.xml |
Replication channel listener (primary instance) |
replconnector.xml |
Replication channel connector (secondary instance) |
replicatedstore.xml |
Replicated File Store (default) |
sharedstore.xml |
Shared File Store |
jdbcstore.xml |
JDBC Store |
Configuring the Primary Instance
Edit hostports.xml to set listener addresses. Use connectaddress and connectaddress2 for the two HA hosts:
<router name="testrouter">
<swiftlet name="sys$jms">
<listeners _op="replace">
<listener name="plainsocket" hostname="localhost" port="4001"
hostname2="localhost" port2="4001"
connectaddress="hostA" connectaddress2="hostB"/>
</listeners>
</swiftlet>
</router>
Start the primary instance with the replication listener:
./router ../preconfig/hostports.xml,../preconfig/repllistener.xml,../preconfig/replicatedstore.xml
Configuring the Secondary Instance
The secondary instance uses the same hostports.xml but connects to the primary via replconnector.xml:
<router>
<swiftlet name="sys$hacontroller">
<replication-channel _op="replace">
<connectors>
<connector name="1" hostname="hostA" port="2001"/>
</connectors>
</replication-channel>
</swiftlet>
</router>
Start the secondary instance with the replication connector:
./router ../preconfig/hostports.xml,../preconfig/replconnector.xml,../preconfig/replicatedstore.xml
Heap Memory
Set SWIFTMQ_JVMPARAM to adjust the JVM heap (default: -Xmx2G):
export SWIFTMQ_JVMPARAM="-Xmx4G"
Running as Docker Containers
Recommended Directory Layout
<routername>/
preconfig/
data/
jdbc-driver/ # only if using JDBC Store
Docker Compose — Primary Instance
services:
swiftmq:
image: "iitsoftware/swiftmq-ha:latest"
ports:
- "4001:4001"
- "2001:2001"
environment:
- SWIFTMQ_PRECONFIG=/swiftmq/preconfig/hostports.xml,/swiftmq/preconfig/repllistener.xml,/swiftmq/preconfig/replicatedstore.xml
- SWIFTMQ_JVMPARAM=-Xmx4G
volumes:
- ./preconfig:/swiftmq/preconfig
- ./data:/swiftmq/data
Docker Compose — Secondary Instance
services:
swiftmq:
image: "iitsoftware/swiftmq-ha:latest"
ports:
- "4001:4001"
environment:
- SWIFTMQ_PRECONFIG=/swiftmq/preconfig/hostports.xml,/swiftmq/preconfig/replconnector.xml,/swiftmq/preconfig/replicatedstore.xml
- SWIFTMQ_JVMPARAM=-Xmx4G
volumes:
- ./preconfig:/swiftmq/preconfig
- ./data:/swiftmq/data
The primary instance exposes port 2001 for the replication channel. The secondary connects to it via the replconnector.xml preconfig.
For inter-container communication, use host.docker.internal to address the Docker host. On Linux, add extra_hosts: ["host.docker.internal:host-gateway"] to the Compose service.
JDBC Store: Replace replicatedstore.xml with jdbcstore.xml, mount the JDBC driver directory (./jdbc-driver:/swiftmq/jdbc-driver), and add SWIFTMQ_STORETYPE=JDBC to the environment.
Shutdown
Always use docker stop for a graceful shutdown — never docker kill, as it can corrupt the persistent store.
Standard HA Configuration
The standard configuration is identical to SwiftMQ UR except it adds attributes for a second instance on JMS/Routing listeners and includes a replication listener on port 2001. Authentication is disabled by default. Log in as anonymous or use admin / secret.
HA Deployment
Network
Both HA instances communicate through a dedicated replication channel used by the HA Controller Swiftlet to replicate state and exchange heartbeat messages. This connection should use a private network segment with dedicated network cards and a switch between both hosts.
Network speed depends on the persistent store type:
- Replicated File Store: Gigabit link recommended for high-load scenarios (store data is replicated over this channel).
- Shared File Store / Shared JDBC Store: 10 or 100 MBit sufficient (no data replication needed).
Persistent Store Options
Replicated File Store (default) — During connection, a store image transfers to the STANDBY. The transaction log replicates synchronously, ensuring both instances are consistent. This avoids disk sync requirements — standard disks suffice since two store copies exist. Performance depends on the replication channel speed.
Shared File Store — Both instances access the same filesystem (e.g., SAN, NFS). No replication needed, but disk sync (force-sync) is mandatory. Without it, a hard kill of the ACTIVE risks transaction log inconsistency, causing STANDBY startup failure. Requires high-speed RAID or SAN hardware.
Shared JDBC Store — Both instances use the same JDBC database. No replication needed, but the database itself must be clustered to avoid a single point of failure.
HA States and Failover
State Machine
The HA Controller manages these states:
| State | Description |
|---|---|
| UNKNOWN | Initial state; awaits negotiation with the other instance |
| INITIALIZE | Replication channel being set up |
| NEGOTIATE | Temporary master elected for role negotiation |
| ACTIVE-SYNC | ACTIVE creates a store snapshot and transfers it to STANDBY |
| ACTIVE | Synchronization complete; ongoing replication to STANDBY |
| STANDBY-SYNC | STANDBY receives store snapshot |
| STANDBY | Synchronization complete; receives replication stream |
| STANDALONE | Other instance disconnected; operates independently |
Failover Process
When the STANDBY detects that the ACTIVE has failed (via IOException or missed heartbeat messages), it transitions to STANDALONE and begins accepting client connections. Clients using reconnection automatically reconnect to the new active instance.
When the previous ACTIVE restarts, it becomes STANDBY while the current STANDALONE becomes ACTIVE. The new ACTIVE replicates its store to the new STANDBY.
Automatic Disk Sync in STANDALONE Mode
When using the Replicated File Store, force-sync defaults to false because the STANDBY provides redundancy. The attribute force-sync-in-standalone-mode (default: true) dynamically enables disk sync when transitioning to STANDALONE and disables it when returning to ACTIVE, protecting against crashes while running without a STANDBY.
Administration
CLI Under HA
The CLI automatically connects to the ACTIVE instance. After failover, the CLI reconnects and always maintains a connection to the current ACTIVE. If a request fails during failover, it is canceled with a warning and must be reissued.
Additional CLI commands for HA:
reboot— Reboots both ACTIVE and STANDBY instanceshalt— Halts both instancessave— Saves configuration on both instancesrebootactive/rebootstandby— Reboot a specific instancehaltactive/haltstandby— Halt a specific instance
Configuration Replication
Configuration changes made on the ACTIVE are automatically replicated to the STANDBY. Changes can be made at any time regardless of whether the STANDBY is connected — replication occurs upon reconnection.
Instance-specific settings are not replicated:
- Deploy Swiftlet
- HA Controller Swiftlet (partial)
- Log Swiftlet
- Store Swiftlet
- Trace Swiftlet
The HA Controller's replication exclude list controls which configuration contexts are excluded from replication.
Router Configuration File
Each instance has its own routerconfig.xml in data/config/. Changes should be applied through the CLI or SwiftMQ Explorer. Direct file modifications require stopping both instances. The <ha-router> section containing state-transition entries is static and must never be modified.
JNDI/JMS Under HA
JNDI Provider URL
The SMQP provider URL includes parameters for both HA instances:
smqp://host1:4001/host2=host2;port2=4002;reconnect=true;retrydelay=1000;maxretries=50
Key parameters: host2 and port2 specify the second HA instance. reconnect=true enables automatic failover reconnection. retrydelay and maxretries control reconnection behavior.
JMS Failover Behavior
SwiftMQ HA provides transparent failover with these guarantees and caveats:
- Persistent messages are preserved across failover. Always use the default delivery mode (persistent) or set it explicitly.
- Non-persistent messages may be lost during failover.
receive(timeout)may returnnullduring failover if the timeout is less than the failover time.- Duplicate detection uses JMS Message-Id. JMS message ID generation must remain enabled.
- Temporary destinations are reconstructed during failover but bound to different physical queues. For reliability, use regular queues or durable subscribers with persistent messages instead.
Reconnect Listener
Applications can register a reconnect listener for failover notifications:
((com.swiftmq.jms.SwiftMQConnection) connection).addReconnectListener(
(host, port) -> {
// Handle reconnection
}
);
Enable reconnect debug output with -Dswiftmq.reconnect.debug=true.
Routing Under HA
Routing connectors and listeners are automatically replicated to the STANDBY instance. Routing listeners include bindaddress2 and port2 attributes for the secondary HA instance.
Define routing connectors at the HA Router level rather than at individual listeners — this avoids configuration changes across a Federated Router Network when failover occurs.
For fallback, connecting routers need an additional routing connector to the secondary HA instance. The system automatically attempts the alternative connector if the primary connection fails.
Split Brain
A split brain occurs when both HA instances operate in STANDALONE mode simultaneously, losing data consistency. Causes include negotiation timeout expiration and replication channel loss (network failure).
The split-brain-instance-action attribute controls the response:
stop— Stops the instance (default)keep— Keeps the instance runningbackup-and-standby— Creates a store backup and restarts as STANDBY
Recommended Configuration for Automatic Recovery
- Mark one instance as
preferred-active(preferred-active="true") - Set the preferred instance to
split-brain-instance-action="keep" - Set the other instance to
split-brain-instance-action="backup-and-standby"
This allows automatic recovery: the preferred instance continues as STANDALONE, the other creates a backup and restarts as STANDBY, then the preferred instance replicates to it.
Duplicate Message Detection
HA environments use duplicate message detection to ensure no message is delivered twice across failover.
Inbound detection (at the router): Controlled by duplicate-detection-enabled and duplicate-detection-backlog-size attributes on queues. Default backlog is 2000 JMS message IDs.
Outbound detection (at the client): Controlled by duplicate-message-detection and duplicate-backlog-size on connection factories. Default backlog is 30,000 per JMS connection.
JMS message IDs are asynchronously replicated to the STANDBY instance.
HA-Specific Swiftlets
- HA Controller (
sys$hacontroller) — Manages the active/standby state machine, heartbeats, replication channel, and failover decisions. - HA Queue Manager (
sys$queuemanager) — Extends the CE Queue Manager with replication-aware queue operations. - HA Store (
sys$store) — Extends the CE Store with synchronous replication of all persistent data.