Advanced Recovery
- 1.1 Overview
- 1.2 Prerequisites
- 1.3 Setup Configuration
- 1.4 Establish SSH connection between both the servers
- 1.4.1 PBX 17+ Improvements :
- 1.5 Install Advanced Recovery module
- 1.6 Advanced Recovery Module Configuration
- 1.6.1 How to Open the Advanced Recovery Module Settings
- 1.6.2 How to Configure the Advanced Recovery Module
- 1.6.2.1 Quick Configuration Wizard
- 1.6.2.1.1 Step-1 Server Configuration
- 1.6.2.1.2 Step-2 Sync
- 1.6.2.1.3 Step-3 Settings
- 1.6.2.1.4 Step-4 Notification
- 1.6.2.1 Quick Configuration Wizard
- 1.6.3 Advanced Recovery service daemon
- 1.6.3.1 Primary Server
- 1.6.3.2 Secondary Server
- 1.6.4 Advanced Recovery Dashboard
- 1.6.4.1 Advanced Recovery Sync Now
- 1.6.5 Modules and Call Recording syncing
- 1.6.6 Switchover Configuration
- 1.6.7 Advanced Configuration
- 1.7 Switchover
- 1.8 Switchback to Primary server
- 1.8.1 Sync back to Primary
- 2 High Level use case scenario using Advanced Recovery module
Overview
The Advanced Recovery Commercial module provides an easy to configure replication engine along with the ability to automatically fail over to a secondary server; this mechanism protects voice services when there is failure in primary server.
This module is applicable to FreePBX versions 15 and higher.
If the backup server cannot be active at the same time as the primary this module will help to ensure that the Secondary server will come up quickly in the event the Primary server goes down to insure less downtime.
Below are the key features of this modules -
Easy PBX GUI interface for configuration - in just a few steps all your current (primary) system configuration will be ready to replicate to the secondary server.
Optional Automatic switchover - As soon as primary service(s) die automatically switching over to secondary system.
Actively monitoring primary server services like Network interface, Asterisk , MySQL or PBX stack so as soon as any of the services dies , switchover to secondary server will happen.
Built-in notification mechanism to provide Call and Email both notification to admin during the fail over.
Provides more control over the trunks to decide what all trunks should gets activated automatically after switchover.
Provides configuration option to control primary server services after the switchover. This is needed for the scenario where secondary becomes active because of loosing connection with primary due to some minor fluctuation issues like power or network or any other quick maintenance so in those situations, primary will comes back after sometime , so its better to stop primary server services to avoid endpoints conflicts where some endpoints can register to primary and some can register to secondary. With the help of this option , secondary will stop the primary server services as soon as primary comes up.
Auto switchover custom Hooks. Provides option to execute custom hooks or third party script during switchover. This is useful when we want to trigger some custom or third party application related logic during the switchover.
Integrated support for Endpoint Manager module to re-build the Sangoma's S and D series phones template to update secondary (fail over) server IP to phones.
Details of each feature and how it can be configured is defined below.
Prerequisites
In order to successfully deploy the Advanced Recovery module the following requirements must be met:
You have an existing PBX system that will be your Primary server.
You have an identical PBX system that will be your Secondary server.
The two servers can communicate on an IP level.
Both systems are configured with their own IP addresses.
Both the systems IP network, can be a local network or can be on separate geographically location.
SSH and HTTP(s) ports should be open between both the servers.
Both servers are running the Advanced Recovery module, and it is "licensed" on each system.
Backup & Restore , API and Filestore are dependent modules that must be installed on both the systems.
ICMP/Ping between Primary and Secondary is required.
Setup Configuration
Advanced Recovery module can be configured by following below mentioned simple steps.
Establish SSH connection between both the servers.
SSH to secondary server from primary without needing password.
SSH to primary server from secondary without needing password.
Configure Advanced Recovery module using "Quick configuration" wizard.
Establish SSH connection between both the servers
We need to establish SSH connection between both the servers.
Please refer to Setting up the SSH key to copy primary server SSH key to secondary so that we can easily SSH to secondary server from primary without needing password.
PBX 17+ Improvements :
Redirect option to Backup & Restore module :
Added new option “+Go to Backup & Add Key” into Advanced recovery → Global settings” which will redirect to backup module > Global Settings to copy this pbx key or add another pbx key.
Verify SSH Connectivity -
On a fresh system, this one time step is mandatory for PBX 17+ system. ‘precheck’ command should be run on both the servers.
Check SSH Connection from Primary Server:
SSH into the primary server.
Run the following command, replacing
SecondaryIP
with the secondary server's IP address: fwconsole advr --precheck SecondaryIPIf the command is successful, it should display “SSH Connectivity is Good”.
Screen-shot attached below
Check SSH Connection from Secondary Server:
SSH into the secondary server.
Run the following command, replacing
PrimaryIP
with the primary server's IP address: fwconsole advr --precheck PrimaryIPIf the command is successful, it should display “SSH Connectivity is Good”.
SSH Folder permissions
In the case that the web GUI show errors during the SSH connection, worth checking the correct permissions are set for the SSH folder and the files it contains. The permissions should look something like:
/home/asterisk/.ssh
.ssh/:
total 12
drwx------ 2 asterisk asterisk 57 Apr 18 2022 .
drwxr-xr-x. 12 asterisk asterisk 263 Feb 22 06:57 ..
-rw------- 1 asterisk asterisk 3292 Jan 13 2021 id_rsa
-rw------- 1 asterisk asterisk 748 Jan 13 2021 id_rsa.pub
-rw------- 1 asterisk asterisk 343 Jan 13 2021 known_hosts
The above files must exist in both servers (primary and secondary). Any discrepancy with the permissions in any of the files, please re-do SSH association step stated above.
Install Advanced Recovery module
Download and install the "Advanced Recovery" module by following "Check Online" and then download install guide as described in Module Admin User Guide#CheckingforAvailableUpgrades wiki.
Advanced Recovery Module Configuration
How to Open the Advanced Recovery Module Settings
Within the PBX GUI, navigate to Admin > Advanced Recovery
How to Configure the Advanced Recovery Module
The main landing page of the Advanced Recovery module has an options to view system status, perform configuration changes, and adjust global settings(like SSH keys).
Quick Configuration Wizard
Quick Configure wizard will provide easy GUI interface to configure Advanced Recovery module.
This quick configuration wizard will take care of configuring primary and secondary server by himself so after this we do not have to do any further configuration.
When you click on "Quick configuration" button then it will pop up wizard as shown below -
Step-1 Server Configuration
Here, We have to specify the "Secondary" Server IP.
Click "Next" , after select "Secondary Server" instance,
If there are any issue in doing ssh to secondary server then this will throw the alert.
If ssh connection to secondary server is good then it will check whether proper licensed "Advanced Recovery" module is installed on secondary server or not.
If AR module is not installed on secondary system, then it will throw error like shown below.
If module is installed but not "licensed" on secondary system, then it will throw error like shown below.
If secondary has proper active licensed Advanced Recovery module, then it will proceed further with Step-2 > Sync.
Step-2 Sync
To define syncing frequency.
Syncing can take from minutes to hours depends on system size(capacity) and additional files/directories that might have been added into the module configuration to be synced.
Step-3 Settings
This section allows you to do necessary configurations on the Advanced Recovery module required for doing replication of configuration to secondary server.
Please find below details of each configuration options in this step.
Auto Switch services
Auto switching the services to secondary server when the threshold time is met.
Disable Remote Trunks
Should the trunks be disabled on secondary server after replicating/restoring trunks configuration?. This is needed when we want trunks to register from both primary and secondary servers at the same time. Generally we would try to keep trunks active only from one server so this option should set to YES. Default is YES.
Exclude NAT Settings
Should NAT settings from the Primary Server be restored to the Secondary Server?
Exclude Bind Address
Should Bind Address settings on the Primary Server be restored to the Secondary Server ?
Exclude DNS
Should DNS settings on the Primary Server be restored to the Secondary Server ?
Apply Config
Should we run "Apply Configs" on the Secondary Server after a restore is completed?
Once done with above configuration, then move on to the next step to do "Notifications" configuration.
Step-4 Notification
This section will allows you to do "Notification" configuration.
Advanced Recovery module gives the option to receive notifications either via calling to an admin extension or via Email.
By default, Call Notification is disabled, if enabled then further options to configure will be shown as follow:
The parameters to configure for Notification section are:
Notification Extension: which extension to call during failover event. On system failure event, active system will initiate call to configured extension and will play the configured announcement. Intention of this call notification is to update admin about the system failure
Recording when primary fails : select recording to play when the Primary server fails. This will specify the list of "recordings" to choose from as configured in System Recording module.
Recording when standby fails: select recording to play when the standby/Warm Spare server fails. This will specify the list of "recordings" to choose from as configured in System Recording module.
Notification Email: email address where notifications will be sent to.
Once done with configuration, press "Configure" to finish the configuration of Advanced Recovery Module.
This will finish the "Quick configuration" part of Advanced Recovery module. If any further modifications of the configuration are needed then please refer to Advanced Recovery Expert Configuration wiki.
We need to start "Advanced Recovery Service" daemon as soon as we done with "Quick configuration" process as described in below section.
Advanced Recovery service daemon
This service daemon is mainly responsible for keep monitoring the health of the primary system and on the event of failure, this will execute the necessary steps to perform switchover to the secondary server.
After completion of "Quick configuration" wizard, we can see status of the Primary and Secondary would be something like below.
Primary Server
Secondary Server
As shown below, dashboard shows configuration is done but service has not yet started. Next step is to "Start' the service from Primary.
Advanced Recovery Dashboard
Dashboard provides the information about service status and last sync time.
We can also use "Sync now" option to forcefully sync the configuration to secondary system.
Advanced Recovery Sync Now
Sync now option is to do manual configuration syncing to secondary server.
This could be useful for user to confirm syncing is working fine as soon as initial configuration is over and also to know that how much time sync could take for the PBX system.
As soon as we click on "sync now" option, a confirmation dialog will pop up asking to confirm, and then we will start seeing the status of the process as shown below:
Dashboard will display "Time since last sync" to reflect when the last sync happened from primary to secondary server.
As soon as the syncing process finish, it will display "Time taken to finish last sync" in HH:MM:SS format which will give a rough estimate of how much time the system can take to sync the configuration.
If require, change the "Syncing scheduling" frequency using "Advanced configuration" option.
Modules and Call Recording syncing
On primary server settings page you have the option to add modules to be added into the sync process. By default all modules are selected and included in syncing process you can unselect the modules which are to be excluded from syncing process.
Also you have the option to add custom directories to be added into the sync process. The files/folders selected on these fields will be on the Secondary Server once the sync is completed and on the same locations.
By default we are adding the Voicemails folder as directory item and this folder will be syncing in incremental basis.
Switchover Configuration
Advanced Recovery module provides configuration options to decide the various actions during switchover.
All the Switchover related configuration is part of the Advanced Configuration.
We can jump to Advanced Configuration by going to "Advanced Recovery Module → Configuration → Primary Server' as shown in below screenshot.
Trunk Selection Configuration option
As soon as we enable the "Auto switch services", it will show list of currently configured trunks in the system.
We can select our desired state of the trunks after a switchover for every configured trunk:
Bring down Primary server after switchover configuration option
Execute 'fwconsole stop' on the primary server after a failover.
This option could be useful in the scenario where due to some partial outages like network or power fluctuation, Primary server looses communication with Secondary. In that given situation Secondary server will become active but after a period of time primary also comes up which will lead to situations where some phones might try to register to the Primary server and while some of them will try with Secondary server.
To avoid this kind of situation, admin can choose to bring down the primary server after switchover or not.
If this option is set to YES then the Advanced Recovery module will keep on checking the configured Primary server to see if it comes up and will bring down all the services on the primary when that happen.
Post Switchover Hook
This is for advanced users who would like to perform some special steps after switchover.
Please specify the custom script path to execute after switchover.
Advanced Configuration
Once Quick configuration wizard is over then any further configuration or change must be done on the 'Advanced Configuration' page. Changes like changing GraphQL API tokens, modifying the Primary/Secondary server IP address, etc. we will need to use "Advanced configuration" as mentioned in Advanced Recovery Expert Configuration
Switchover
Advanced Recovery module decides Primary is down by detecting at least one of the following conditions:
Network connectivity is down on Primary server - Secondary server lost communication with primary server
Asterisk running status on Primary server
FreePBX stack running status on Primary server
Database running status on Primary server
Switchover to secondary server will happen as soon it detects any of the failure condition as mentioned above and the threshold time has been reached.
Advanced Recovery modules will perform following actions during switchover (in order):
Switchover related actions as configured in SwitchoverConfiguration
Enable the Trunks on secondary as configured in TrunkSelectionConfigurationoption
Execute post switchover hooks to run custom third party script with an "START" argument.
Notify to admin via Call to admin extension if Call Notification is enable.
Notify to admin via Email
Failover recommendations
The Advanced Recovery module will be beneficial during outages by automatically switching services over to a Secondary server when a failure is detected on Primary server. However, it is critical to understand and be aware that there are other network elements such as IP/SIP Phones, SIP Trunking, routers, etc. that need to be configured properly to ensure they start working smoothly after services are switched over.
SIP Phones Recommendation
Regenerate existing Sangoma's phone configuration
Advanced recovery module has an option to regenerate the configuration of already connected/configured Sangoma's S and D series phones via Endpoint Manager.
"Advanced Recovery → Endpoint → Regenerate EPM config for S and D series phones"
This option will add the 'Secondary Server' IP address parameter into the selected template as 'Backup SIP Server'. The option 'Update Phones' may also be selected to force all the phone under the template to pull a new configuration from the server.
Manually editing templates for Sangoma's S series phones
The 'Regenerate EPM Config for S and D series phones' mentioned above will take care of the Sangoma templates's configs for the backup server. Phone configs for any other Brand will have to be done manually by editing the related template.
Sangoma S and D series phones support the configuration of a "Failover" IP along with the Primary IP.
The Endpoint Manager module, which is "Free" to use for Sangoma's S and D series phones, can be used to help configure this setup.
Please refer to Connecting Sangoma Phone to FreePBX or PBXact Indepth for detailed guide of using Endpoint Manager for Sangoma's S series phones.
We have to "enable" Backup destination field and Secondary server information in below template to achieve the failover in case of primary server failure.
Manually editing templates for Sangoma's D series phones
The backup destination address is added in the D/P Series phone template, in the Redundancy tab. (EPM → Sangoma → D & P series phones)
Please refer to EPM-Admin User Guide#AdminUserGuide-templatesTemplateCreationandEditing(ExamplewithSangomaBrand) guide to see example of how to edit templates via EPM.
SIP Trunk recommendation
This is recommended to ensure SIP Trunk provider allows registration requests from both the Primary and Secondary server's IP.
During the event of a failure when secondary server will become active then SIP Trunk provider should be able to accept the registration request from secondary server to bring up the SIP traffic. This does not apply if both Primary and Secondary servers are behind the same Public IP since both servers will register from the same source IP.
IT admin recommendation
It is advisable to IT persons or PBX's administrators to take care of below roles and responsibilities:
Any networking changes required in order to bring up the secondary server (like router's port forwarding in case of NATing environment to make sure):
Secondary server registration messages are reaching the SIP Trunking provider.
Messages from the SIP Trunk need to reach the secondary server as they would do on the primary.
Phones are able to register with the secondary server.
Make sure Primary and Secondary server IPs are not changing and if they are changing, we need to make sure GraphQL configuration (only server URL) are updating accordingly because both the servers are talking to each other using GraphQL API URIs.
IP changes might result in false declaration of "server down" event.Make sure both Primary and Secondary servers are accessible to each other. The Firewall module will need to have both the servers IP whitelisted accordingly.
Make sure SSH connectivity between both the servers.
Keep in mind that on latest Recording Report module (v15.0.4.28+) Call Recording files will not be the part of "full system backup" so make sure the call recordings directory is included in the Primary Server - Advanced Recovery settings (as default they are located in /var/spool/asterisk/monitor/)
Switchback to Primary server
Advanced Recovery module is mainly designed to do easy failover to Secondary system on the event of Primary server failure.
Once primary server is back up and running then its recommended to switch back services to primary to ensure any subsequent disruption in the future will not affect the phone system's availability.
During the switchover scenario we need to follow the below mentioned steps:
Login to Secondary server administrative GUI which is active as of now.
Stop the Advanced recovery daemon to avoid getting notifications from Freepbx/PBXact GUI → Advanced Recovery option.
Repair Primary server (if possible) or bring up new a Primary server by a fresh installation of FreePBX/PBXact.
Once Primary server is ready then follow steps as mentioned in "Sync back to primary" to sync the data from secondary to primary.
Once syncing over, switchback to primary server so that the primary server will become the active node and secondary will become the standby server.
Sync back to Primary
This option will be useful when we want to bring up the Primary server which could be either the same server or new server/installation.
When secondary server is running as active, "dashboard" status on secondary server will show Primary server is down and an option to Sync back to Primary.
"Sync back to Primary" option will open the below wizard and will ask to enter Primary server IP. On recent versions the Primary server's IP address will be automatically populated from the existing config.
After entering Primary server IP, you will have two options: Sync or Skip Sync
Sync: this Option will sync the Secondary server data to the Primary server (overwriting any existing configuration on the primary)
Skip Sync: this option will skip the Sync and jump to the Switch back page
As a part of syncing, data from secondary server will push to primary server IP.
Once syncing to Primary server has finished, the 3rd step will give you the option to do "Switch back" which is the process of reverting the status of the trunks on the secondary server (disabling them) and turning back on the trunks on the Primary. This process will also update the status for the Advanced Recovery module so the Primary server will be the new (again) active server and the health check will be from the Primary → Secondary server.
High Level use case scenario using Advanced Recovery module
Advanced Recovery module will help to maintain the below fail over scenario where Secondary server will take over the production when there is a critical failure in Primary server, as illustrated below:
Frequently Asked Questions
Do we need a floating IP now like in the old HA setup?
A: No. A floating IP is not a requirement with this module. Each server has its own IP address. SSH communication must be open between both the servers.
Do both servers have identical configurations, except that on the standby server the trunks are disabled to avoid registrations coming from two machines simultaneously.
A: Yes. This module provides more granularity to control the trunks either during syncing or after switchover. "Disable Remote Trunks" option will take care of trunks status during normal primary and secondary system and "Swicthover Trunk selection" option will take what should be the trunk status (like want to enable or disable) after switchover.
Do non Sangoma phones need to be configured to register to the backup server address if registration to primary is unsuccessful.
A: Yes we need to manually set the Fall back destination or sip server address to SIP phones so during the primary system failure time, phone can get register to secondary system.
Are services like Asterisk, FreePBX, etc. all running on both machines no matter whether in standby or active ?
A: Yes. During a normal state, all services will be fully running on both servers: Active and StandBy server.
How does the monitoring happen?
A: Primary server and Secondary server monitor each other and send notifications as soon as failure of peer node happens. If the primary goes down then switchover happens. If the Secondary goes down, only a notification will be sent to keep the administrator aware of the StandBy server failure.
What does the “Bring down Primary server after switchover” option mean?
A: After switchover, if this option is enabled then the Secondary will keep on monitoring the primary server IP and if Primary server comes back up then will perform a 'fwconsole stop' on primary which basically means stopping all running services like asterisk and any other FreePBX processes. This is required when we want only one node to be active to avoid:
Split registration scenarios where some phones can register to primary and some to secondary.
Sending SIP trunks registration request from both servers using the same account/credentials/auths.
We could have situations in which due to some network or power fluctuations it will result in lost of the communication between servers and due to which, switchover happens.
I have two FreePBX servers, each have two NICs. One will be providing regular access to the network to which SIP will bind to, and another NIC to directly connect to the other FreePBX server. How can I configure Advanced Recovery module to use dedicated NIC or LAN interface for monitoring - syncing purposes between the servers ?
A: As long as the two servers are able to communicate at IP level with each other then Advanced Recovery module will work fine. Each NIC will have its own IP so you just have to make sure "direct" link between servers is configured in such a way that communication between both the servers over dedicated NIC interface is happening as expected.
For example Server A NIC with IP x.y.z.w and Server B NIC with IP a.b.c.d and there is direct link between both servers. We have to take care of following pre-conditions during the configuration:
Routing or communication between x.y.z.w and a.b.c.d at IP level is working.
SSH and HTTP(s) traffic is working over those IPs.
Whitelist these IPs in both servers to ensure firewall is not blocking access between each other.
SSH key setup between both the servers to ensure they can SSH to each other using these IPs.
Use this NIC IP in Advanced module configuration. x.y.z.w as Primary's server IP and a.b.c.d as Secondary's server IP.
When we use Endpoint Manager to re-build failover IP configuration, ensure that you are choosing the correct SIP server IP. If the template is for local phones, x.y.z.w and a.c.b.d will be primary and backup IPs accordingly. If the template that we are editing is to config remote phones/endpoints, make sure you put in the Public IP addresses for primary and secondary servers and not the private IPs (x.y.z.w and a.c.b.d). Same needs to be done while doing manual configuration for non-Sangoma brands as well.