FreePBX HA-Entire cluster is dead!

If you've clicked on this link, the odds are everything has fallen to pieces and you don't know why.

The two most common problems are non-FreePBX RPMs breaking symlinks, and, people running commands they shouldn't.

As it's unfortunately common for people to accidentally upgrade their asterisk or mysql with a version that ISN'T aware of HA, we have a 'fixcluster' script built into FreePBX-HA

This script should be in /usr/local/asterisk/fixcluster on both machines.  If, for any reason, that fixcluster script isn't on either machine, it can be downloaded as an attachment from this page, or as a third possibility, via GitHub.

Before running this script, ensure that only one HA machine is on. The machine that is on will become the master, and all services will be started on that node.

Running the script is simple:

[root@freepbx-a freepbx_ha]# /usr/local/asterisk/fixcluster Checking symlinks ....... Warning: /var/lib/mysql is NOT a symlink, and it should be. Run with --fixlinks to repair Done Clearing Errors .................... Done Removing Restraints .................... Done [root@freepbx-a freepbx_ha]# /usr/local/asterisk/fixcluster --fixlinks Checking symlinks .......! Done Clearing Errors .................... Done Removing Restraints .................... Done [root@freepbx-a freepbx_ha]# /usr/local/asterisk/fixcluster Checking symlinks ........ Done Clearing Errors .................... Done Removing Restraints .................... Done [root@freepbx-a freepbx_ha]#

All errored services should now be started, and running the 'pcs status' command should show no errors:

[root@freepbx-a ~]# pcs status Cluster name: Last updated: Wed Oct 8 07:09:44 2014 Last change: Tue Oct 7 14:46:32 2014 via crmd on freepbx-b Stack: cman Current DC: freepbx-a - partition WITHOUT quorum Version: 1.1.10-14.el6-368c726 2 Nodes configured 20 Resources configured Online: [ freepbx-a ] OFFLINE: [ freepbx-b ] Full list of resources: spare_ip (ocf::heartbeat:IPaddr2): Started freepbx-a floating_ip (ocf::heartbeat:IPaddr2): Started freepbx-a Master/Slave Set: ms-asterisk [drbd_asterisk] Masters: [ freepbx-a ] Stopped: [ freepbx-b ] Master/Slave Set: ms-mysql [drbd_mysql] Masters: [ freepbx-a ] Stopped: [ freepbx-b ] Master/Slave Set: ms-httpd [drbd_httpd] Masters: [ freepbx-a ] Stopped: [ freepbx-b ] Master/Slave Set: ms-spare [drbd_spare] Masters: [ freepbx-a ] Stopped: [ freepbx-b ] spare_fs (ocf::heartbeat:Filesystem): Started freepbx-a Resource Group: mysql mysql_fs (ocf::heartbeat:Filesystem): Started freepbx-a mysql_ip (ocf::heartbeat:IPaddr2): Started freepbx-a mysql_service (ocf::heartbeat:mysql): Started freepbx-a Resource Group: asterisk asterisk_fs (ocf::heartbeat:Filesystem): Started freepbx-a asterisk_ip (ocf::heartbeat:IPaddr2): Started freepbx-a asterisk_service (ocf::heartbeat:freepbx): Started freepbx-a Resource Group: httpd httpd_fs (ocf::heartbeat:Filesystem): Started freepbx-a httpd_ip (ocf::heartbeat:IPaddr2): Started freepbx-a httpd_service (ocf::heartbeat:apache): Started freepbx-a PCSD Status: Error: no nodes found in corosync.conf [root@freepbx-a ~]#

Note that the 'Error: no nodes found in corosync.conf' is expected, and isn't actually an error.

At this point, you can now go into FreePBX, go to the HA Manage page, and run the full check there. That will fix any other errors that may have occured on the local machine.  

Only after you have run those checks, turn the other machine back on.

When the other machine boots, it will do a DRBD synchronization (visible via the web interface). When that's complete, you can then run another check (via the FreePBX HA web interface) which will validate the other machine and fix anything it can. When that's complete, without any errors, the cluster will be fully repaired.

 

Return to Documentation Home I Return to Sangoma Support