Clearing Errors from the CLI
If a service has errored enough times for the cluster to be unable to start it on either node, you'll need to manually clear the errors.
[root@freepbx-aaa /]# pcs status Cluster name: Last updated: Wed Mar 12 04:37:50 2014 Last change: Tue Mar 11 13:49:29 2014 via cibadmin on freepbx-a Stack: cman Current DC: freepbx-b - partition with quorum Version: 1.1.10-14.el6-368c726 [...] Resource Group: asterisk asterisk_fs (ocf::heartbeat:Filesystem): Stopped asterisk_ip (ocf::heartbeat:IPaddr2): Stopped asterisk_service (ocf::heartbeat:freepbx): Stopped Resource Group: httpd httpd_fs (ocf::heartbeat:Filesystem): Stopped httpd_ip (ocf::heartbeat:IPaddr2): Stopped httpd_service (ocf::heartbeat:apache): Stopped Failed actions: asterisk_service_monitor_30000 on freepbx-a 'not running' (7): call=186, status=complete, last-rc-change='Wed Mar 12 04:38:10 2014', queued=0ms, exec=0ms asterisk_service_monitor_30000 on freepbx-b 'not running' (7): call=240, status=complete, last-rc-change='Wed Mar 12 04:26:05 2014', queued=0ms, exec=0ms
Because Asterisk was discovered 'not running', on both nodes, the Cluster has decided that it's impossible to start and has marked it as unusable on both modes. If this was, for example, a hardware error that was causing the problem that has now been resolved, you can clear the errors with the following command:
crm_resource --resource asterisk_service -C --node freepbx-a crm_resource --resource asterisk_service -C --node freepbx-b |
Note that the resource used is whatever is specified before the _monitor in the error line. It could be that something appeared on your network with the same IP address as your floating IP, which caused it to fail. In that case you would have the failed actions being 'asterisk_ip_monitor_...' and you would need to clear the errors on asterisk_ip
Setting a node online or unstandby
If you end up in the situation where the only reachable node is in standby, you'll need to manually bring it out of standby via the command line
pcs cluster unstandby freepbx-b |
Note that it's possible that the machine may have a buggy pcs on there (due to a RedHat issue) and will error saying "Error: node 'freepbx-b' does not appear to exist in configuration" or similar.
You'll need to use the alternative command
crm_unstandby -D -N freepbx-b |
Setting a node offline or standby
If you end up in the situation where the only reachable node is in standby, you'll need to manually bring it out of standby via the command line
pcs cluster standby freepbx-b |
Note that it's possible that the machine may have a buggy pcs on there (due to a RedHat issue) and will error saying "Error: node 'freepbx-b' does not appear to exist in configuration" or similar.
You'll need to use the alternative command
crm_standby -D -N freepbx-b |
To determine the status of each node after running the aforementioned commands ( these results are from a healthy system, node A)
cat /proc/drbd 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:896 nr:0 dw:1044 dr:3049 al:7 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:480 nr:0 dw:580 dr:2105 al:5 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:4 nr:0 dw:4 dr:3633 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:4 nr:0 dw:4 dr:685 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 |
How to determine the floating IP
pcs resource show floating_ip |