Managing your cluster¶

When moving from a standard Linux server with a LAMP stack (for example), there are a number of differences in how you manage the server and its services. Further to this, consideration to the cluster architecture should be given before adding new services or applications onto your nodes.

We utilise the Red Hat clustering suite, which uses Pacemaker (pcs) as the cluster manager.

Service management¶

Services which are managed by the cluster should only be started / stopped / restarted by the cluster manager (PCS) - not through the use of the service, systemctl, or /etc/init.d/ commands. Performing these actions via the aforementioned commands does not inform PCS that the service status was intended to change, and may result in the node being fenced.

For any services which PCS is not aware of, you can continue to use conventional methods of restarting them. If in doubt, it’s always best to check with our support team for clarification.

When performing commands on clustered services, it is usually best to run them on the service group (e.g. g_mysql) rather than an individual resource (e.g. r_mysql_ip).

Clustered service status¶

To view the current clustered services, their status, and the node they’re running on, use the pcs status command:

[root@acme-webdb-01 ~]# pcs status
Cluster name: acme-webdb
Stack: corosync
Current DC: acme-webdb-02 (version 1.1.15-11.el7_3.5-e174ec8) - partition with quorum
Last updated: Fri Jul 14 13:01:21 2017          Last change: Wed Jul 12 12:20:00 2017 by root via crm_resource on acme-webdb-01

2 nodes and 15 resources configured

Online: [ acme-webdb-01 acme-webdb-02 ]

Full list of resources:

 acme-webdb-01-stonith   (stonith:fence_idrac):  Started acme-webdb-02
 acme-webdb-02-stonith   (stonith:fence_idrac):  Started acme-webdb-01
 Clone Set: ping-clone [ping]
     Started: [ acme-webdb-01 acme-webdb-02 ]
 Master/Slave Set: ms_web_drbd [r_web_drbd]
     Masters: [ acme-webdb-01 ]
     Slaves: [ acme-webdb-02 ]
 Resource Group: g_web
     r_web_fs   (ocf::heartbeat:Filesystem):    Started acme-webdb-01
     r_web_ip   (ocf::heartbeat:IPaddr2):       Started acme-webdb-01
     r_web_httpd        (systemd:httpd):        Started acme-webdb-01
 Master/Slave Set: ms_mysql_drbd [r_mysql_drbd]
     Masters: [ acme-webdb-02 ]
     Slaves: [ acme-webdb-01 ]
 Resource Group: g_mysql
     r_mysql_fs (ocf::heartbeat:Filesystem):    Started acme-webdb-02
     r_mysql_ip (ocf::heartbeat:IPaddr2):       Started acme-webdb-02
     r_mysql_int_ip     (ocf::heartbeat:IPaddr2):       Started acme-webdb-02
     r_mysql    (ocf::heartbeat:mysql): Started acme-webdb-02

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@acme-webdb-01 ~]#

The above shows that the g_web group contains: the web VIP, web filesystem, and httpd service. It is started and running on the acme-webdb-01 node. The g_mysql service is running on acme-webdb-02, and contains: the MySQL external VIP, the MySQL crossover VIP, the MySQL filesystem, and the mysqld service.

If any services show as failed, you would normally receive information about the errors encountered at the end of this output.

To view more detailed information about the configuration of the individual resources in the group, use the pcs config show command.

At the very top of the output, the acme-webdb-01-stonith and acme-webdb-02-stonith resources are listed and running on the opposing node to their name. These are used to send fence commands to recover a node from failure.

If you think that the output from pcs status is inaccurate, you can run pcs resource cleanup to re-check the status of all the resources in the cluster - or pcs resource cleanup r_mysql_int_ip (for example) to re-check an individual resource.

Stopping clustered services¶

To stop a clustered service, you should issue the following command swapping g_mysql for the group you want to stop):

[root@acme-webdb-01 ~]# pcs resource disable g_mysql

This will shut down the service on it’s active node, and instruct the passive node not to take over, and should be considered equivalent to running service mysqld stop.

Note

Services are started in-order in PCS, so if you disable a resource in the middle of a group (r_mysql_ip for example), it will also stop all the services following it in the list (e.g. r_mysql_int_ip and r_mysql would also stop).

Simlarly, if one of the resources in a group fails, the whole group will attempt to recover on the other node or stop.

Starting clustered services¶

To start a service which you’ve disabled in the cluster, run the following (swapping g_mysql for the group you want to start):

[root@acme-webdb-01 ~]# pcs resource enable g_mysql

This will update the PCS configuration to mark the resource enabled, and it might take a few seconds before the cluster actually starts the group up. You can monitor the output of pcs status to see it start up with the following:

[root@acme-webdb-01 ~]# watch -n1 "pcs status"

Press CTRL + C to exit this watch screen.

Restarting clustered services¶

To restart a cluster service, run the following command (swapping g_mysql for the group you want to restart):

[root@acme-webdb-01 ~]# pcs resource restart g_mysql

It is best to restart whole resource groups where possible to prevent unexpected behaviour.

Moving a cluster service to another node¶

When performing maintenance for a node, you might want to move the resources running there onto its counterpart. To do that, we will use the pcs resource move command, however this has a caveat.

The way PCS “moves” a resource is by creating a ban constraint for the node it is running on at the moment. For example, if g_mysql was running on acme-webdb-02 and you issued a move command, PCS would add a constraint which prevents g_mysql from starting on that node anymore.

When PCS next checks the status of the cluster, it will reconcile the rule that is now being violated (g_mysql is running on acme-webdb-02, which is is banned from) by stopping the service and starting it on acme-webdb-01.

However, if you left the cluster in this state, you would not be able to move the service back to acme-webdb-02. If acme-webdb-01 were to fail at this point, g_web would move over to acme-webdb-02, but g_mysql would fail to start and enter a stopped state.

As such, you should perform the following steps:

Issue a pcs resource move command to add the ban constraint:

[root@acme-webdb-01 ~]# pcs resource move g_mysql

Watch the output of pcs status until you’ve confirmed the group has fully moved over:

[root@acme-webdb-01 ~]# watch -n1 "pcs status"

Issue a pcs resource clear command to remove the ban constraint:

[root@acme-webdb-01 ~]# pcs resource clear g_mysql

This will not move the resource group back over unless a prefers or colocation constraint is now being violated (which in most cases it would not be); however the resource group is now free to be moved again in the future, or failover as needed.

Putting a node into standby¶

You can temporarily disable a node in the cluster by running the following command on the node that you want to disable:

[root@acme-webdb-01 ~]# pcs cluster standby

This will move any resources running on the node off to other members, essentially putting the node into a “maintenance mode” of sorts.

Note

We have found that putting a node into standby mode can cause the DRBD volumes to be disconnected by PCS, which can prevent DRBD replication from taking place while the node is in standby. As such, having standby enabled for long periods of time is discouraged.

If you’re unsure about using pcs cluster standby, make use of the pcs resource move commands instead to make a node passive.

Taking a node out of standby¶

To put a node back into service after pcs cluster standby has been performed on it, run the following on the node you want to enable:

[root@acme-webdb-01 ~]# pcs cluster unstandby

Resources will not move back over to the node unless a prefers or colocation constraint are currently being violated, so you might want to manually perform a pcs resource move to balance services out again.

Resource constraints¶

Constraints are used to define ‘rules’ which define where resources are started up within the cluster. These are also used by some PCS commands to facilitate certain functions, such as the pcs resource move command.

Viewing configured constraints¶

You can view the current constraints configured on the cluster with the following:

[root@acme-webdb-01 ~]# pcs constraint show --full
Location Constraints:
  Resource: acme-webdb-01-stonith
    Enabled on: acme-webdb-02 (score:INFINITY) (id:location-acme-webdb-01-stonith-acme-webdb-02-INFINITY)
  Resource: acme-webdb-02-stonith
    Enabled on: acme-webdb-01 (score:INFINITY) (id:location-acme-webdb-02-stonith-acme-webdb-01-INFINITY)

[...]

Ticket Constraints:
[root@acme-webdb-01 ~]#

These should be managed carefully as an incorrect constraint cause unexpected behaviour, and you might take the resources offline.

Next Article > Distributed Replicated Block Device (DRBD)

ANS Documentation

Improve This Doc

Managing your cluster¶

Service management¶

Clustered service status¶

Stopping clustered services¶

Starting clustered services¶

Restarting clustered services¶

Moving a cluster service to another node¶

Putting a node into standby¶

Taking a node out of standby¶

Resource constraints¶

Viewing configured constraints¶