High availability setup

Flexisip can achieve high availability by the combination of 3 things:

  • Several flexisip instances running on multiple host machines, with an appropriate configuration to serve the same domains
  • SRV records to spread the traffic amongst several Flexisip instances, and make sure that client can try alternate routes if one SRV record is down
  • Redis registrar set up in a master/slave configuration and monitored with sentinels

The ideal and simplified setup is such that each host has:

  1. A redis instance configured with both requirepass and masterauth equal passwords, and a slaveof configured to the current Redis master;
  2. A flexisip instance configured to connect to the current redis instance;
  3. A redis-sentinel instance configured to monitor the current Redis master.

Flexisip-ha_conf.png

With that setup, the redis-sentinel will connect to the master redis DB, and start monitoring its slave (including the local instance). The redis instance will start replicating the master and wait for client connections. The flexisip instance will connect to the redis master and start handling client interactions.

On network error, if the master redis database is impacted, the sentinels will elect a new redis master and configure all the redis network to reflect that.

For flexisip, the behavior is as such:

  1. when the connection to the master is working, we periodically ask for the list of slaves of this master
  2. if the connection to the master is lost, we will try successively all known slaves and wait for a new master to be elected.
  3. when a new master is elected, flexisip will drop the slave connection and connect to the new master.
  4. at this point, the network will be able to process registrations again.

Overall, the time it takes will depend on the sentinel configuration. We recommend a 10s delay.

High availability setup requirements

It is REQUIRED to install an NTP daemon on all machines running the REDIS and Flexisip instances. Indeed, flexisip requests REDIS to automatically remove expired registrations. This mechanism is relying on universal time. If any node of the cluster has a wrong time information, then this management of registration is broken. Clients will then experience 404 Not found responses from Flexisip for destinations that were correctly registered.

On a debian system, this is done by installing the NTP daemon:

sudo apt-get install ntp

/etc/ntp.conf might be customized to set the hostname of your favourite NTP server (exemple: the one of your hosting provider).

Sample configurations

Redis master

In file /etc/redis/redis.conf:

bind *
requirepass ComplicatedPassWord123456789
masterauth ComplicatedPassWord123456789

All other Redis instances

bind *
requirepass ComplicatedPassWord123456789
masterauth ComplicatedPassWord123456789
slaveof <master ip> <master port>

All Redis sentinels

In file /etc/redis/sentinel.conf:

# sentinel monitor <name> <ip master> <port> <quorum size>
sentinel monitor flexi1 10.0.0.1 6379 2
sentinel down-after-milliseconds flexi1 10000
sentinel failover-timeout flexi1 20000
sentinel auth-pass flexi1 ComplicatedPassWord123456789

# For Redis 3.2 and later
protected-mode no

The quorum size is the number of sentinels that must be agree on the fact that master is down before triggering the election of the new master. For a cluster of 3 nodes, a quorum of 2 is a good value. If the quorum is equal to the size of the cluster, the election process will never be initiated.

The protected mode must be disable in order sentinels be able to accept requests not coming from loopback interface even if those are listening on all interfaces. Please note that by disabling proteceted mode, you will expose your sentinels to the public network whereas these are not able to authenticate each other. To solve that security issue, the firewall should be set to authorized sentinel request coming from a whitelist of IP addresses.

Alternatively, if all your sentinels are on a safe subnetwork or VPN, you should let the protected mode enabled and make your sentinels listen on the interface with the private network.

All flexisip configurations

In file /etc/flexisip/flexisip.conf, in the [global] sections, transports must be defined for each host, for example for host1:

[global]
transports=sips:host1

in the [module::Registrar] section:

reg-domains=mydomain.com
db-implementation=redis
redis-server-domain=10.0.0.1
redis-server-port=6379
redis-auth-password=ComplicatedPassWord123456789

in [cluster] section:

enabled=true

# List of IP addresses of all nodes present in the cluster
nodes=<IP host1> <IP host2> <IP host3>

Typical DNS SRV records configuration

Active/Active configuration for sips with 2 nodes.

_sips._tcp.sip 3600 IN SRV 0 100 5061 host1.
_sips._tcp.sip 3600 IN SRV 0 100 5061 host2.

Typical scenario in case of failure in a HA configuration

We have 3 hosts, and the current Redis master is the host 1.

  • Host 1 suffers a failure, and becomes unreachable. The other Flexisip instances immediately detect the failure and start connecting to another Redis slave. The enter a wait mode, where no new registration can be made, until a new master is elected.

Flexisip-ha_failure_step_1.png

  • After the configured delay in the sentinels (10s is recommended), they start the election process to set a new Redis master. In this case, Host2 is deemed new master. Host3's redis is reconfigured by the sentinels to adopt the new master. The Host2 and Host3 flexisip notice the change and automatically migrate to the new Redis master database.

Flexisip-ha_failure_step_2.png

  • Once Host1 comes back online, the sentinels will detect its livelyhood and reconfigure it as a slave. The Host1 flexisip will automatically migrate to the newly elected master Redis (Host2).

Flexisip-ha_failure_step_3.png

Tags: flexisip
Created by Simon Morlat on 2017/02/14 11:53